Mistral API & Developer Guide: From Setup to Production (2026)
Last verified: May 7, 2026 · Format: Guide · Est. time: 20-25 min
Mistral's La Plateforme gives you API access to a growing family of open-weight and premier models, from the compact Ministral 3B to the flagship Mistral Large 3. If you have shipped code against the OpenAI or Anthropic APIs, the Mistral SDK will feel familiar: same chat-completions pattern, same Bearer token auth, similar function-calling conventions. The differences that matter are the open-weight licensing (Apache 2.0 for most models), the fill-in-the-middle endpoint for code completion, and the aggressive price-to-performance positioning that makes Mistral worth evaluating for latency-sensitive and cost-constrained workloads.
This guide walks through every step from API key to production-ready integration: SDK installation, your first chat completion, function calling, structured output, embeddings, and the model selection decisions that determine whether your application runs efficiently or burns through your budget.
What You Need Before Starting
Mistral's API is accessed through La Plateforme, their developer console. You will need an account, an API key, and one of the official SDKs installed. The API follows standard REST conventions with Bearer token authentication at the base URL https://api.mistral.ai/v1.
- ✓Step 1: Get API Key & Install SDK
- ✓Step 2: First Chat Completion
- ✓Step 3: Choose the Right Model
- ✓Step 4: Function Calling
- ✓Step 5: Structured JSON Output
- ✓Step 6: Embeddings for RAG
- ✓Step 7: Code Completion
- ✓Step 8: Deploy to Production
Step 1: Get Your API Key and Install the SDK
Start at console.mistral.ai. Create an account if you do not have one, then navigate to API Keys in the left sidebar. Click "Create new key," give it a descriptive name (e.g., "dev-local"), and copy the key immediately. You will not see it again after leaving the page.
Set the environment variable
macOS / Linux:
export MISTRAL_API_KEY="your_api_key_here"
Windows (PowerShell):
$env:MISTRAL_API_KEY = "your_api_key_here"
For persistence, add the export to your shell profile (.bashrc, .zshrc) or use a .env file with a library like python-dotenv.
Install the SDK
Python:
pip install mistralai
TypeScript / JavaScript:
npm install @mistralai/mistralai
Both SDKs wrap the REST API at https://api.mistral.ai/v1 and handle authentication, retries, and streaming automatically.
Verification: Run python -c "import mistralai; print(mistralai.__version__)" to confirm the Python SDK installed correctly. If you get an ImportError, check that you are using the correct Python environment.
Step 2: Send Your First Chat Completion
The chat completions endpoint (POST /v1/chat/completions) accepts an array of messages with roles: system, user, assistant, and tool.
Python example
import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{
"role": "system",
"content": "You are a senior DevOps engineer. "
"Answer concisely with code examples."
},
{
"role": "user",
"content": "Write a Dockerfile for a Python "
"FastAPI app with multi-stage builds."
}
],
temperature=0.3,
max_tokens=1024
)
print(response.choices[0].message.content)
TypeScript example
import { Mistral } from "@mistralai/mistralai";
const client = new Mistral({
apiKey: process.env.MISTRAL_API_KEY
});
const response = await client.chat.complete({
model: "mistral-large-latest",
messages: [
{
role: "user",
content: "Explain Kubernetes pod scheduling "
+ "in 3 sentences."
}
]
});
console.log(response.choices[0].message.content);
Streaming responses
For real-time output (chat UIs, long-form generation), enable streaming:
stream = client.chat.stream(
model="mistral-small-latest",
messages=[
{"role": "user",
"content": "Write a CI/CD pipeline "
"for GitHub Actions."}
]
)
for chunk in stream:
content = chunk.data.choices[0].delta.content
if content:
print(content, end="", flush=True)
Verification: You should see a complete response printed to your terminal. If you get a 401 error, double-check your API key environment variable. A 429 error means you have hit rate limits: wait a few seconds and retry.
Step 3: Choose the Right Model
Mistral's model lineup splits into frontier generalist models and specialist models. Your choice directly impacts latency, cost, and output quality.
Frontier Generalist Models
| Model | API ID | Type | Best For |
|---|---|---|---|
| Mistral Large 3 | mistral-large-latest |
Open | Complex reasoning, multimodal, agents |
| Mistral Medium 3.5 | mistral-medium-latest |
Open | Agentic workflows, coding, balanced cost |
| Mistral Small 4 | mistral-small-latest |
Open | Fast inference, hybrid instruct/reasoning |
| Mistral Medium 3.1 | mistral-medium-2501 |
Premier | Production workloads, enterprise |
| Magistral Medium 1.2 | magistral-medium-latest |
Premier | Extended reasoning, chain-of-thought |
| Ministral 3 14B | ministral-14b-latest |
Open | Edge deployment, local inference |
| Ministral 3 8B | ministral-8b-latest |
Open | Resource-constrained environments |
| Ministral 3 3B | ministral-3b-latest |
Open | Mobile, IoT, ultra-low latency |
Specialist Models
| Model | API ID | Type | Purpose |
|---|---|---|---|
| Codestral | codestral-latest |
Premier | Code completion, FIM |
| Devstral 2 | devstral-latest |
Open | Software engineering agents |
| OCR 3 | mistral-ocr-latest |
Premier | Document OCR |
| Voxtral TTS | voxtral-tts-latest |
Open | Text-to-speech, voice cloning |
| Voxtral Transcribe | voxtral-transcribe-latest |
Premier | Audio transcription |
| Mistral Moderation 2 | mistral-moderation-latest |
Premier | Content safety classification |
Open vs. Premier: Open-weight models are available under Apache 2.0 for self-hosting on your own GPU infrastructure via vLLM, TGI, or Ollama. Premier models are API-only through La Plateforme. For production workloads where you need to control the inference stack, open-weight models let you deploy on your own terms.
Decision heuristic: Start with mistral-small-latest for prototyping (fast, cost-effective). Upgrade to mistral-large-latest when you need stronger reasoning or multimodal input. Use codestral-latest for code-specific tasks. Drop to ministral-8b-latest or ministral-3b-latest for edge or latency-critical paths.
Verification: Call the Models endpoint (GET /v1/models) to see which models are available on your account. Some premier models require a billing method to be attached before they appear.
Step 4: Add Function Calling (Tool Use)
Function calling lets your application give Mistral models access to external tools: databases, APIs, calculators, or any function you define. The model decides which tools to call and generates the arguments; your code executes the function and passes results back.
Supported models for function calling
According to Mistral AI, function calling works on Mistral Large 3, Mistral Medium 3.1, Mistral Small 3.2, Devstral 2.0, Magistral Medium 1.2, Codestral, and Ministral 3 (all parameter sizes).
Define your tools
tools = [
{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current stock price "
"for a given ticker symbol",
"parameters": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker "
"(e.g., AAPL, MSFT)"
}
},
"required": ["ticker"]
}
}
}
]
Make the tool-enabled request
import json
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user",
"content": "What is the current "
"price of AAPL stock?"}
],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
name = tool_call.function.name
args = json.loads(
tool_call.function.arguments
)
# Execute your function
result = get_stock_price(args["ticker"])
# Send result back to the model
follow_up = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user",
"content": "What is the current "
"price of AAPL stock?"},
message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
}
],
tools=tools
)
print(follow_up.choices[0].message.content)
Control parameters
tool_choice="auto": Model decides whether to call a tool (default)tool_choice="any": Forces the model to use at least one tooltool_choice="none": Prevents tool use entirelyparallel_tool_calls=True: Allows multiple simultaneous tool calls (default)
Verification: The model should generate a tool call with name: "get_stock_price" and arguments: {"ticker": "AAPL"}. If the model responds with text instead, check that your tool definition schema is valid JSON and tool_choice is set to "auto" or "any".
Step 5: Request Structured JSON Output
For applications that need machine-parseable responses (APIs, data pipelines, form generation), use the response_format parameter to force JSON output.
JSON mode
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{
"role": "system",
"content": "You are an API that returns "
"JSON. Always respond with "
"valid JSON only."
},
{
"role": "user",
"content": "List the top 3 Python web "
"frameworks with name, "
"github_stars (approximate), "
"and primary_use_case."
}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(
response.choices[0].message.content
)
print(json.dumps(data, indent=2))
JSON Schema mode (stricter)
For guaranteed schema compliance, pass a JSON Schema definition:
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user",
"content": "Generate a user profile "
"for testing."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {
"type": "string",
"format": "email"
},
"role": {
"type": "string",
"enum": ["admin",
"editor",
"viewer"]
},
"active": {"type": "boolean"}
},
"required": ["name", "email",
"role", "active"]
}
}
}
)
Verification: Parse the response with json.loads(). If it throws a JSONDecodeError, the model may have included markdown formatting. Add "Do not wrap the response in markdown code fences" to your system prompt.
Step 6: Generate Embeddings for RAG
Mistral provides text and code embedding models for retrieval-augmented generation (RAG), semantic search, clustering, and classification.
Text embeddings
response = client.embeddings.create(
model="mistral-embed",
inputs=[
"Kubernetes autoscaling strategies",
"Docker container networking"
]
)
for i, embedding in enumerate(response.data):
print(f"Input {i}: "
f"{len(embedding.embedding)} dimensions")
Building a basic RAG pipeline
- Chunk your documents into passages (500-1000 tokens each)
- Generate embeddings for each chunk using
mistral-embed - Store embeddings in a vector database (Pinecone, Weaviate, pgvector, Qdrant)
- At query time, embed the user question and retrieve the top-k most similar chunks
- Pass retrieved chunks as context in the system prompt to
mistral-large-latest
# Simplified RAG query
context_chunks = vector_db.search(
query_embedding, top_k=5
)
context = "\n\n".join(
[chunk.text for chunk in context_chunks]
)
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{
"role": "system",
"content": "Answer based on the "
"following context. If the "
"context does not contain the "
"answer, say so.\n\n"
f"Context:\n{context}"
},
{"role": "user",
"content": user_question}
]
)
Verification: Check that your embeddings return a consistent number of dimensions. Store a few test embeddings and verify that semantically similar inputs produce higher cosine similarity scores than unrelated inputs.
Step 7: Use Codestral for Code Completion
Codestral is Mistral's specialized code model with fill-in-the-middle (FIM) capability, meaning it can complete code given both the prefix (code before the cursor) and suffix (code after the cursor). This is what powers IDE integrations.
FIM request
response = client.fim.complete(
model="codestral-latest",
prompt="def calculate_area(shape, dims):\n"
" if shape == 'circle':\n"
" ",
suffix="\n elif shape == 'rectangle':\n"
" return dims['width'] "
"* dims['height']"
)
print(response.choices[0].message.content)
# Expected: return 3.14159
# * dims['radius'] ** 2
IDE integration
Codestral integrates with VS Code via the Continue extension and with JetBrains IDEs. Set up your API key in the extension settings and point it to codestral-latest for inline completions.
When to use Codestral vs. Devstral
- Codestral (
codestral-latest): Best for real-time code completion, FIM, and inline suggestions. Optimized for speed. - Devstral 2 (
devstral-latest): Best for agentic coding tasks, multi-file refactoring, and software engineering workflows that require planning and reasoning. According to Mistral AI, Devstral 2 is a "frontier code agents model for solving software engineering tasks."
Verification: Send a FIM request with a known code pattern and verify the completion is syntactically valid. Test with multiple languages (Python, JavaScript, Rust) to confirm polyglot support.
Step 8: Deploy to Production
Moving from prototype to production requires attention to error handling, rate limiting, cost control, and observability.
Error handling
from mistralai import Mistral
client = Mistral(
api_key=os.environ["MISTRAL_API_KEY"]
)
try:
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user",
"content": prompt}
],
max_tokens=2048
)
return response.choices[0].message.content
except Exception as e:
if "429" in str(e):
# Rate limited: exponential backoff
time.sleep(2 ** retry_count)
elif "401" in str(e):
# Authentication failure
log.critical("Invalid API key")
else:
raise
Batch processing
For high-volume workloads (bulk classification, dataset enrichment), use the Batch API to submit requests in bulk:
batch = client.batch.create(
model="mistral-small-latest",
requests=[
{
"custom_id": f"req-{i}",
"body": {
"messages": [
{"role": "user",
"content": text}
]
}
}
for i, text in enumerate(texts)
]
)
Cost optimization checklist
- Use
mistral-small-latestfor classification, extraction, and simple generation - Reserve
mistral-large-latestfor complex reasoning and multimodal tasks - Enable response caching for repeated queries
- Set
max_tokensto the minimum needed (do not leave it at defaults) - Use batch processing for non-real-time workloads
- Monitor usage via the La Plateforme console dashboard
Content moderation
Use mistral-moderation-latest to pre-screen user inputs or post-screen model outputs:
moderation = client.classifiers.moderate_chat(
model="mistral-moderation-latest",
inputs=[
{"role": "user",
"content": user_input}
]
)
# Check moderation results before processing
Verification: Deploy to a staging environment and run 100 test requests across different models. Monitor response times, error rates, and token consumption. Set up alerts for 429 (rate limit) and 500 (server error) responses.
Troubleshooting and FAQ
echo $MISTRAL_API_KEY (macOS/Linux) or $env:MISTRAL_API_KEY (PowerShell). Generate a new key at console.mistral.ai if needed. Keys are shown only once at creation time.max_tokens. The default may be too low for your use case. Also check that your input is not exceeding the model's context window. If streaming, ensure your client is reading all chunks before closing the connection.tool_choice is set to "auto" or "any". Verify your tool schema has valid JSON with type, description, and parameters fields. Some models handle tools more reliably than others: mistral-large-latest is the most reliable for function calling.response_format={"type": "json_object"} also helps enforce clean JSON output.mistral-small-latest for cost-sensitive prototyping. mistral-large-latest for production quality. codestral-latest for code-specific tasks. ministral-8b-latest for edge deployment where you need to self-host.Next Step
Build a function-calling agent. Pick a real API your team uses (Jira, Slack, your internal database), define it as a tool, and wire it into a Mistral chat loop. This exercise demonstrates the full integration pattern and helps you evaluate whether Mistral's function calling reliability meets your production requirements.
AI API providers process your inputs on remote servers. Review Mistral's data processing agreement and privacy policy at mistral.ai/terms before sending sensitive data through the API. Enterprise customers should evaluate data residency options (Mistral offers EU-hosted infrastructure).
Open-weight models deployed on your own infrastructure keep all data local, which may be required for regulated industries.
AI tools are productivity aids, not replacements for human judgment. If you or someone you know is in crisis:
988 Suicide & Crisis Lifeline: Call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: Text HOME to 741741
See the NIST AI Risk Management Framework for organizational AI governance guidance.
Under GDPR and CCPA, you have rights to access, correct, and delete your data. Mistral AI is headquartered in Paris, France and operates under EU data protection regulations.
TechJack Solutions maintains editorial independence. This article was not sponsored or reviewed by Mistral AI. For AI regulation context, see our EU AI Act overview.