MiniMax MCP | MCPBro - MCP Market - MCP Server Directory

What is MiniMax MCP

MiniMax MCP is an implementation of an MCP server that provides high-quality text-to-speech, voice cloning, image, and video generation capabilities to MCP clients. By following the open MCP specification, it allows any compliant LLM frontend or agent framework to leverage MiniMax’s multimodal APIs via a unified protocol, supporting both local and cloud integration.

How to Configure MiniMax MCP

Acquire an API key and host URL from the MiniMax platform (choose the appropriate region: Global or Mainland).
Install uv, the required Python package manager, using curl -LsSf https://astral.sh/uv/install.sh | sh.
Edit your client configuration (e.g., in Claude Desktop's claude_desktop_config.json or Cursor's MCP settings), adding a new MCP server entry for MiniMax MCP.
- Set "command" to "uvx" or the absolute path to uvx.
- Add your MINIMAX_API_KEY and relevant MINIMAX_API_HOST environment variables.
- Optionally set output base path and resource mode for local/URL file handling.
For Windows users with Claude Desktop, enable Developer Mode to activate custom MCP servers.
Save and restart your MCP client to connect and begin using MiniMax MCP.

How to Use MiniMax MCP

After configuration:

Open your MCP-compliant AI tool (e.g., Claude Desktop, Cursor).
Access the MCP-integrated actions or tool panel.
Invoke tools such as text_to_audio, text_to_image, or generate_video by providing the necessary prompts or files.
For asynchronous operations (like video), use the provided endpoints and tool functions to monitor or retrieve results.
Adjust local/cloud file handling based on whether you use stdio or SSE (network) transport.
Review API responses or outputs, which may be saved to a designated directory or accessed via URLs.

Key Features

Direct plug-and-play integration with top AI tools via the open MCP protocol.
Multi-modal support including real-time text-to-speech, high-fidelity voice cloning, and advanced text-to-image/video generation.
Flexible transport options: local (stdio) or cloud-based (SSE).
Automatic resource discovery and standard tool invocation via MCP endpoints.
Secure data access with support for regional key separation and local/URL-based resource handling.
Extensible to future MiniMax generation modalities and tools.

Use Cases

Auto-generate high-quality newscast audio from LLM-created scripts.
Clone a specific voice and synthesize audio in that style for personalized AI avatars.
Instantly generate and insert images or video scenes within AI workflows, such as for presentation building or coding assistance.
Chain MiniMax MCP with downstream tools (e.g., Claude, Cursor) for complex agentic tasks that mix text, code, visual, and audio generation.
Create synthetic training data, personalized media, or automate content pipelines in creative and enterprise scenarios.

FAQ

1. How do I resolve 'invalid api key' errors? Make sure your API key matches the correct API host according to your region (Global: https://api.minimaxi.chat, Mainland: https://api.minimax.chat). Obtain your key from the correct MiniMax user center and double-check spelling for the host.

2. I get 'spawn uvx ENOENT' when starting the server. What should I do? Ensure that the uvx executable is available in your system path or specify its absolute path in your configuration. You can find it using which uvx.

3. Why can't I use MiniMax MCP tools in Claude Desktop on Windows? You must enable Developer Mode in Claude Desktop (from the Help menu) to permit third-party MCP servers.

4. Why is a video generation request not completing? Video generation can be asynchronous; define completion rules in your IDE or use the query_video_generation tool to check the status of your video generation task.

5. Are there costs associated with using these tools? Yes, usage of MultiMax MCP's APIs (text-to-speech, video/image generation, etc.) incurs costs on the MiniMax platform. Refer to your account dashboard for pricing.

Tool Name	Description
text_to_audio	Convert input text into spoken audio using a selected voice.
list_voices	List all available voices for text-to-speech or cloning.
voice_clone	Clone a voice from provided audio files for customized speech generation.
generate_video	Generate a video from a prompt description.
text_to_image	Create an image from a text prompt.
query_video_generation	Query and retrieve the status or result of an asynchronous video generation task.