Connect Sarvam to Claude: Indic Translation and Audio Transcription
Learn how to connect Sarvam to Claude using a managed MCP server. Give your AI agents the tools to run Indic text translation, async audio transcription, and document intelligence.
If you need to connect Sarvam to Claude to automate Indic language translation, speech-to-text transcription, or document digitization, you need a Model Context Protocol (MCP) server. This server acts as the translation layer between Claude's JSON-based tool calls and Sarvam's REST APIs. You can either build and maintain this infrastructure yourself, or use a managed integration platform like Truto to dynamically generate a secure, authenticated MCP server URL. If your team uses ChatGPT, check out our guide on connecting Sarvam to ChatGPT or explore our broader architectural overview on connecting Sarvam to AI Agents.
Giving a Large Language Model (LLM) read and write access to a specialized AI infrastructure platform like Sarvam is an engineering challenge. You have to handle API authentication, map specific multipart file upload schemas to MCP tool definitions, and deal with Sarvam's specific asynchronous job polling mechanics. Every time Sarvam updates an endpoint or releases a new speech model (like Saaras v3), you have to update your server code, redeploy, and test the integration.
This guide breaks down exactly how to use Truto to generate a secure, managed MCP server for Sarvam, connect it natively to Claude, and execute complex multilingual workflows using natural language.
The Engineering Reality of the Sarvam API
A custom MCP server is a self-hosted integration layer. While the open MCP standard provides a predictable way for models to discover tools, the reality of implementing it against vendor APIs is painful. You are not just integrating "Sarvam" - you are integrating a suite of specific audio, translation, and document intelligence models, all of which have different design patterns and constraints.
If you decide to build a custom MCP server for Sarvam, you own the entire API lifecycle. Here are the specific challenges you will face:
Asynchronous Polling and State Machines
Sarvam offers both synchronous and asynchronous endpoints for speech-to-text. While short audio clips can be sent to the synchronous endpoint, enterprise use cases (like transcribing hour-long sales calls) require the asynchronous batch API. This means your MCP server cannot simply proxy a request and wait for a response. The LLM must be given tools to create a job (create_a_sarvam_speech_to_text_job), upload files, start the job, and then repeatedly poll a status endpoint (get_single_sarvam_speech_to_text_job_by_id). If you do not explicitly instruct the LLM on how to handle this state machine, it will hallucinate job completions or spam the polling endpoint.
Multipart Form-Data and File Handling
Unlike standard JSON REST APIs, Sarvam's speech-to-text endpoints require raw audio files to be uploaded via multipart/form-data. LLMs operate in text. Your custom MCP server must act as a broker, translating an LLM's request (which might just pass a local file path or a URL) into a properly formatted multipart request containing the binary payload. Handling file streams, memory limits, and timeouts in your MCP server adds significant operational overhead.
Strict Language Codes and Model Parameters
Sarvam's translation and transliteration endpoints enforce strict language code formats (e.g., hi-IN for Hindi, ta-IN for Tamil). If Claude passes "Hindi" instead of the expected code, the API rejects the request. You have to build enum validation directly into your MCP JSON Schemas to prevent the LLM from making malformed requests.
Strict Rate Limits and HTTP 429s
AI infrastructure APIs are highly resource-intensive, and Sarvam enforces strict rate limits on transcription and translation requests. If your AI agent tries to process too many documents in parallel, Sarvam will return an HTTP 429 Too Many Requests error. Important architectural note: Truto does not retry, throttle, or apply backoff on rate limit errors. When the upstream Sarvam API returns a 429, Truto passes that error directly to the caller. Truto normalizes the upstream rate limit information into standardized headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) per the IETF spec. The caller (your AI agent framework or Claude) is entirely responsible for reading these headers and implementing retry/backoff logic.
Instead of building this orchestration layer from scratch, you can use Truto. Truto normalizes the schemas and exposes Sarvam's endpoints as ready-to-use MCP tools.
How to Generate a Sarvam MCP Server with Truto
Truto dynamically generates MCP servers based on the actual documentation and schema definitions of the underlying integration. You can generate a Sarvam MCP server via the Truto UI or programmatically via the API.
Method 1: Generating via the Truto UI
If you need an MCP server quickly for Claude Desktop, the UI is the fastest path.
- Navigate to the Integrated Accounts page in your Truto dashboard and select your connected Sarvam account.
- Click the MCP Servers tab.
- Click Create MCP Server.
- Configure the server name, allowed methods, and an optional expiration date. For testing, you can leave the filters blank to expose all documented tools.
- Copy the generated MCP server URL (e.g.,
https://api.truto.one/mcp/a1b2c3d4e5f6...).
Method 2: Generating via the Truto API
If you are provisioning workspaces dynamically for your users, you should create MCP servers programmatically.
The Truto API validates the configuration, generates a secure cryptographic token, stores it in Cloudflare KV for low-latency routing, and returns a ready-to-use URL.
Request:
curl -X POST https://api.truto.one/integrated-account/{integrated_account_id}/mcp \
-H "Authorization: Bearer YOUR_TRUTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Sarvam Production Agent",
"config": {
"methods": ["read", "write", "custom"]
}
}'Response:
{
"id": "mcp_srv_987654321",
"name": "Sarvam Production Agent",
"config": {
"methods": ["read", "write", "custom"]
},
"expires_at": null,
"url": "https://api.truto.one/mcp/a1b2c3d4e5f67890abcdef"
}This URL is fully self-contained. It encodes the integrated account context and authentication, meaning your Claude client needs nothing else to connect.
Connecting the MCP Server to Claude
Once you have the Truto MCP URL, you need to register it with your LLM client. Here is how to do it using both the UI and configuration files.
Method 1: Via the Claude UI (or ChatGPT UI)
If you are using Anthropic's enterprise web interface or ChatGPT's developer mode, you can add the server directly through the UI.
For Claude Web/Enterprise:
- Go to Settings → Integrations → Add MCP Server.
- Paste the Truto MCP URL into the connection field.
- Click Add.
For ChatGPT (Requires Developer Mode):
- Go to Settings → Apps → Advanced settings.
- Enable Developer mode.
- Under MCP servers / Custom connectors, add a new server.
- Name it "Sarvam Tools" and paste the Truto MCP URL.
- Save the configuration.
Method 2: Via Manual Configuration File (Claude Desktop)
If you are using Claude Desktop or an open-source agent framework (like Cursor), you configure the MCP server using a JSON file. Truto's managed MCP servers use the Server-Sent Events (SSE) transport protocol.
Open your claude_desktop_config.json file (located at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows) and add the following:
{
"mcpServers": {
"sarvam_tools": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-sse",
"https://api.truto.one/mcp/a1b2c3d4e5f67890abcdef"
]
}
}
}Restart Claude Desktop. Claude will initialize the JSON-RPC handshake, pull the tool schemas from Truto, and make them available in your chat interface.
Hero Tools for Indic Language Automation
Truto automatically translates Sarvam's complex API endpoints into flattened, LLM-friendly schemas. Here are the highest-leverage tools available for your AI agents.
Create a Sarvam Speech to Text
Transcribe audio using Sarvam's Saaras v3 speech recognition model. This synchronous endpoint takes an audio file and returns the transcribed output.
Usage Notes: The agent must provide a valid file path or buffer. The model parameter is required (typically saaras:v3). It supports different output modes, including transcribe (default), translate (outputs English), verbatim, translit, and codemix.
"I have a customer support call recording at
/tmp/support_call_42.wav. Please run it through Sarvam's speech-to-text tool using the Saaras model to get the transcription."
Create a Sarvam Text Translation
Translate text from one Indic language to another using Sarvam AI's translation service.
Usage Notes: The agent must provide the input text, source_language_code, and target_language_code. The language codes must match Sarvam's accepted BCP-47 variants.
"Take this paragraph and use the Sarvam text translation tool to translate it from English (
en-IN) into Hindi (hi-IN)."
Create a Sarvam Text Transliteration
Transliterate text from one script to another (e.g., converting Latin characters to Devanagari script without changing the actual spoken words).
Usage Notes: Similar to translation, requires input, source_language_code, and target_language_code.
"Transliterate the string 'Mera naam Rahul hai' from English script to Hindi script using the Sarvam transliteration tool."
Create a Sarvam Text to Speech
Convert text to speech using Sarvam AI's TTS engine, synthesizing audio in the specified target language.
Usage Notes: Returns an opaque synthesized audio payload. Best used when the agent needs to generate a dynamic voice response for an automated dialer or IVR system.
"Generate an audio file for the phrase 'Your OTP is 4567' in Tamil using the Sarvam text-to-speech tool."
Create a Sarvam Speech to Text Job
Initiate an asynchronous speech-to-text transcription job for large files. Returns a job_id instead of a transcript.
Usage Notes: The agent must understand that this tool only starts the process. It must subsequently use the get_single_sarvam_speech_to_text_job_by_id tool to check the status.
"Submit the 1-hour town hall recording at
/data/townhall.mp3as an asynchronous Sarvam speech-to-text job, and tell me the job ID."
Get Single Sarvam Speech to Text Job By ID
Poll the status of a pending async transcription job. Returns the current status and the transcript once the job completes.
Usage Notes: The agent should be instructed to wait between polling requests (e.g., applying a delay) to avoid hitting rate limits.
"Check the status of Sarvam speech-to-text job ID 'job-987'. If it is completed, output the transcript. If not, wait 10 seconds and check again."
For the complete inventory of available operations and full JSON schema definitions, check out the Sarvam integration page.
Workflows in Action
Exposing individual endpoints is useful, but the real power of MCP comes from chaining these tools together to build autonomous workflows.
Workflow 1: The Asynchronous Audio Processing Pipeline
Large audio files require async handling. An AI agent can manage this entire state machine autonomously.
"We just finished a 45-minute sales call with a client in Mumbai. The audio file is located at
/recordings/mumbai_call.mp3. Please transcribe the entire call using Sarvam, wait for the job to complete, and then provide a summary of the key action items in English."
create_a_sarvam_speech_to_text_job: Claude calls the tool, passing the file and model. Sarvam returns ajob_id(e.g.,job_123).get_single_sarvam_speech_to_text_job_by_id: Claude polls the status endpoint. If the status isprocessing, Claude waits.get_single_sarvam_speech_to_text_job_by_id: Claude polls again. Status iscompleted. Claude retrieves the massive JSON transcript payload.- Internal Processing: Claude reads the transcript, translates the context internally or via another tool, and generates the final English summary.
sequenceDiagram
participant User
participant Claude as "Claude Desktop"
participant MCP as "Truto MCP Server"
participant Sarvam as "Sarvam API"
User->>Claude: "Transcribe /recordings/mumbai_call.mp3..."
Claude->>MCP: call: create_a_sarvam_speech_to_text_job
MCP->>Sarvam: POST /speech-to-text/async
Sarvam-->>MCP: { job_id: "job_123", status: "pending" }
MCP-->>Claude: Result: job_123
loop Polling Loop
Claude->>MCP: call: get_single_sarvam_speech_to_text_job_by_id (job_123)
MCP->>Sarvam: GET /speech-to-text/async/job_123
Sarvam-->>MCP: { status: "processing" }
MCP-->>Claude: Result: processing
end
Claude->>MCP: call: get_single_sarvam_speech_to_text_job_by_id (job_123)
MCP->>Sarvam: GET /speech-to-text/async/job_123
Sarvam-->>MCP: { status: "completed", transcript: "..." }
MCP-->>Claude: Result: transcript data
Claude-->>User: "Here is the summary of the Mumbai call..."Workflow 2: Automated Multilingual Support Triage
Agents can use Sarvam's language models to process inbound customer tickets in regional languages and translate them for the engineering team.
"Read this incoming customer feedback: 'எனக்கு இந்த செயலியில் உள்நுழைய முடியவில்லை'. Identify the language, translate it to English, and categorize the issue."
create_a_sarvam_text_language_identification: Claude passes the raw text. The tool returnsta-IN(Tamil).create_a_sarvam_text_translation: Claude passes the text, settingsource_language_codetota-INandtarget_language_codetoen-IN. The tool returns "I am unable to login to this app".- Internal Processing: Claude reads the English output, categorizes the issue as "Authentication/Login", and replies to the user.
Security and Access Control
Giving an LLM unconstrained access to your Sarvam API keys can lead to massive token consumption and unexpected infrastructure bills. Truto provides strict governance controls encoded directly into the MCP token:
- Method Filtering: Use
config.methodsto restrict the server to specific operations. For example, setting["read"]allows the agent to check job statuses but prevents it from initiating expensive new transcription jobs. - Tag Filtering: Use
config.tagsto limit access to specific operational groups, ensuring an agent built for translation cannot accidentally trigger document intelligence workflows. - Secondary Authentication (
require_api_token_auth): By default, possessing the MCP URL grants access. By settingrequire_api_token_auth: true, Truto forces the client to also provide a valid Truto API token in theAuthorizationheader, preventing unauthorized execution if the URL leaks. - Time-to-Live (
expires_at): You can generate short-lived MCP servers for contractors or temporary AI agents. Once the Unix timestamp is reached, Cloudflare KV automatically evicts the token, terminating access instantly.
If you want to orchestrate complex voice and text processing pipelines without managing OAuth states, maintaining translation schemas, or handling binary file transport protocols, Truto's dynamic MCP server architecture removes the undifferentiated heavy lifting. Your engineers can focus on building intelligent prompts, while Truto handles the execution layer.
FAQ
- Can Claude handle Sarvam's asynchronous speech-to-text jobs?
- Yes. Truto exposes Sarvam's async endpoints as distinct MCP tools. The LLM can use the 'create' tool to start a job, receive a job ID, and then use the 'get by id' tool to poll the status until the transcript is ready.
- How does the MCP server handle rate limits from Sarvam?
- Truto acts as a passthrough layer for rate limits. It does not apply automatic retries or backoff. When Sarvam returns a 429 error, Truto passes the error to Claude and normalizes the rate limit data into standard IETF headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) so the client framework can manage the retry logic.
- How do I secure the Sarvam MCP server?
- Truto allows you to apply strict filters when generating the MCP server. You can restrict the server to read-only methods, filter by specific tags, set an expiration date, and mandate secondary API token authentication.
- Do I need to write custom JSON Schemas for Sarvam's multipart audio endpoints?
- No. Truto automatically generates the required MCP tool schemas derived from Sarvam's API documentation, handling the formatting requirements for you.