Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Mistral

Mistral API & Developer Guide: From Setup to Production (2026)

Last verified: May 7, 2026  ·  Format: Guide  ·  Est. time: 20-25 min

Mistral's La Plateforme gives you API access to a growing family of open-weight and premier models, from the compact Ministral 3B to the flagship Mistral Large 3. If you have shipped code against the OpenAI or Anthropic APIs, the Mistral SDK will feel familiar: same chat-completions pattern, same Bearer token auth, similar function-calling conventions. The differences that matter are the open-weight licensing (Apache 2.0 for most models), the fill-in-the-middle endpoint for code completion, and the aggressive price-to-performance positioning that makes Mistral worth evaluating for latency-sensitive and cost-constrained workloads.

This guide walks through every step from API key to production-ready integration: SDK installation, your first chat completion, function calling, structured output, embeddings, and the model selection decisions that determine whether your application runs efficiently or burns through your budget.

12+
Models spanning generalist, reasoning, code, vision, audio, and moderation
Source: Mistral AI Models Documentation (May 2026)
Apache 2.0
Open-weight license on flagship models for self-hosting
Source: Mistral AI Models Documentation (May 2026)
11
API endpoints: chat, FIM, embeddings, batch, OCR, audio, moderation
Source: Mistral API Reference (May 2026)
5 min
Time from API key to first chat completion response
Source: Mistral Developer Quickstart (May 2026)

What You Need Before Starting

Mistral's API is accessed through La Plateforme, their developer console. You will need an account, an API key, and one of the official SDKs installed. The API follows standard REST conventions with Bearer token authentication at the base URL https://api.mistral.ai/v1.

Prerequisites Checklist
A Mistral account at console.mistral.ai with an API key generated
Python 3.9+ or Node.js 18+ installed on your machine
A text editor or IDE (VS Code recommended for Codestral integration)
A terminal with environment variable support
Basic familiarity with REST APIs and JSON
Optional: billing method attached for premier model access
0 of 6 complete
Guide Progress
0 of 8 steps complete
  • Step 1: Get API Key & Install SDK
  • Step 2: First Chat Completion
  • Step 3: Choose the Right Model
  • Step 4: Function Calling
  • Step 5: Structured JSON Output
  • Step 6: Embeddings for RAG
  • Step 7: Code Completion
  • Step 8: Deploy to Production

Step 1: Get Your API Key and Install the SDK

Start at console.mistral.ai. Create an account if you do not have one, then navigate to API Keys in the left sidebar. Click "Create new key," give it a descriptive name (e.g., "dev-local"), and copy the key immediately. You will not see it again after leaving the page.

Set the environment variable

macOS / Linux:

export MISTRAL_API_KEY="your_api_key_here"

Windows (PowerShell):

$env:MISTRAL_API_KEY = "your_api_key_here"

For persistence, add the export to your shell profile (.bashrc, .zshrc) or use a .env file with a library like python-dotenv.

Install the SDK

Python:

pip install mistralai

TypeScript / JavaScript:

npm install @mistralai/mistralai

Both SDKs wrap the REST API at https://api.mistral.ai/v1 and handle authentication, retries, and streaming automatically.

Verification: Run python -c "import mistralai; print(mistralai.__version__)" to confirm the Python SDK installed correctly. If you get an ImportError, check that you are using the correct Python environment.

Step 2: Send Your First Chat Completion

The chat completions endpoint (POST /v1/chat/completions) accepts an array of messages with roles: system, user, assistant, and tool.

Python example

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {
            "role": "system",
            "content": "You are a senior DevOps engineer. "
                       "Answer concisely with code examples."
        },
        {
            "role": "user",
            "content": "Write a Dockerfile for a Python "
                       "FastAPI app with multi-stage builds."
        }
    ],
    temperature=0.3,
    max_tokens=1024
)

print(response.choices[0].message.content)

TypeScript example

import { Mistral } from "@mistralai/mistralai";

const client = new Mistral({
  apiKey: process.env.MISTRAL_API_KEY
});

const response = await client.chat.complete({
  model: "mistral-large-latest",
  messages: [
    {
      role: "user",
      content: "Explain Kubernetes pod scheduling "
               + "in 3 sentences."
    }
  ]
});

console.log(response.choices[0].message.content);

Streaming responses

For real-time output (chat UIs, long-form generation), enable streaming:

stream = client.chat.stream(
    model="mistral-small-latest",
    messages=[
        {"role": "user",
         "content": "Write a CI/CD pipeline "
                    "for GitHub Actions."}
    ]
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Verification: You should see a complete response printed to your terminal. If you get a 401 error, double-check your API key environment variable. A 429 error means you have hit rate limits: wait a few seconds and retry.

Step 3: Choose the Right Model

Mistral's model lineup splits into frontier generalist models and specialist models. Your choice directly impacts latency, cost, and output quality.

Frontier Generalist Models

Model API ID Type Best For
Mistral Large 3 mistral-large-latest Open Complex reasoning, multimodal, agents
Mistral Medium 3.5 mistral-medium-latest Open Agentic workflows, coding, balanced cost
Mistral Small 4 mistral-small-latest Open Fast inference, hybrid instruct/reasoning
Mistral Medium 3.1 mistral-medium-2501 Premier Production workloads, enterprise
Magistral Medium 1.2 magistral-medium-latest Premier Extended reasoning, chain-of-thought
Ministral 3 14B ministral-14b-latest Open Edge deployment, local inference
Ministral 3 8B ministral-8b-latest Open Resource-constrained environments
Ministral 3 3B ministral-3b-latest Open Mobile, IoT, ultra-low latency

Specialist Models

Model API ID Type Purpose
Codestral codestral-latest Premier Code completion, FIM
Devstral 2 devstral-latest Open Software engineering agents
OCR 3 mistral-ocr-latest Premier Document OCR
Voxtral TTS voxtral-tts-latest Open Text-to-speech, voice cloning
Voxtral Transcribe voxtral-transcribe-latest Premier Audio transcription
Mistral Moderation 2 mistral-moderation-latest Premier Content safety classification

Open vs. Premier: Open-weight models are available under Apache 2.0 for self-hosting on your own GPU infrastructure via vLLM, TGI, or Ollama. Premier models are API-only through La Plateforme. For production workloads where you need to control the inference stack, open-weight models let you deploy on your own terms.

Decision heuristic: Start with mistral-small-latest for prototyping (fast, cost-effective). Upgrade to mistral-large-latest when you need stronger reasoning or multimodal input. Use codestral-latest for code-specific tasks. Drop to ministral-8b-latest or ministral-3b-latest for edge or latency-critical paths.

Verification: Call the Models endpoint (GET /v1/models) to see which models are available on your account. Some premier models require a billing method to be attached before they appear.

Step 4: Add Function Calling (Tool Use)

Function calling lets your application give Mistral models access to external tools: databases, APIs, calculators, or any function you define. The model decides which tools to call and generates the arguments; your code executes the function and passes results back.

Supported models for function calling

According to Mistral AI, function calling works on Mistral Large 3, Mistral Medium 3.1, Mistral Small 3.2, Devstral 2.0, Magistral Medium 1.2, Codestral, and Ministral 3 (all parameter sizes).

Define your tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price "
                           "for a given ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker "
                            "(e.g., AAPL, MSFT)"
                    }
                },
                "required": ["ticker"]
            }
        }
    }
]

Make the tool-enabled request

import json

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "user",
         "content": "What is the current "
                    "price of AAPL stock?"}
    ],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message

if message.tool_calls:
    for tool_call in message.tool_calls:
        name = tool_call.function.name
        args = json.loads(
            tool_call.function.arguments
        )

        # Execute your function
        result = get_stock_price(args["ticker"])

        # Send result back to the model
        follow_up = client.chat.complete(
            model="mistral-large-latest",
            messages=[
                {"role": "user",
                 "content": "What is the current "
                            "price of AAPL stock?"},
                message,
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                }
            ],
            tools=tools
        )
        print(follow_up.choices[0].message.content)

Control parameters

  • tool_choice="auto": Model decides whether to call a tool (default)
  • tool_choice="any": Forces the model to use at least one tool
  • tool_choice="none": Prevents tool use entirely
  • parallel_tool_calls=True: Allows multiple simultaneous tool calls (default)

Verification: The model should generate a tool call with name: "get_stock_price" and arguments: {"ticker": "AAPL"}. If the model responds with text instead, check that your tool definition schema is valid JSON and tool_choice is set to "auto" or "any".

Step 5: Request Structured JSON Output

For applications that need machine-parseable responses (APIs, data pipelines, form generation), use the response_format parameter to force JSON output.

JSON mode

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {
            "role": "system",
            "content": "You are an API that returns "
                       "JSON. Always respond with "
                       "valid JSON only."
        },
        {
            "role": "user",
            "content": "List the top 3 Python web "
                       "frameworks with name, "
                       "github_stars (approximate), "
                       "and primary_use_case."
        }
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(
    response.choices[0].message.content
)
print(json.dumps(data, indent=2))

JSON Schema mode (stricter)

For guaranteed schema compliance, pass a JSON Schema definition:

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "user",
         "content": "Generate a user profile "
                    "for testing."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_profile",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {
                        "type": "string",
                        "format": "email"
                    },
                    "role": {
                        "type": "string",
                        "enum": ["admin",
                                 "editor",
                                 "viewer"]
                    },
                    "active": {"type": "boolean"}
                },
                "required": ["name", "email",
                             "role", "active"]
            }
        }
    }
)

Verification: Parse the response with json.loads(). If it throws a JSONDecodeError, the model may have included markdown formatting. Add "Do not wrap the response in markdown code fences" to your system prompt.

Step 6: Generate Embeddings for RAG

Mistral provides text and code embedding models for retrieval-augmented generation (RAG), semantic search, clustering, and classification.

Text embeddings

response = client.embeddings.create(
    model="mistral-embed",
    inputs=[
        "Kubernetes autoscaling strategies",
        "Docker container networking"
    ]
)

for i, embedding in enumerate(response.data):
    print(f"Input {i}: "
          f"{len(embedding.embedding)} dimensions")

Building a basic RAG pipeline

  1. Chunk your documents into passages (500-1000 tokens each)
  2. Generate embeddings for each chunk using mistral-embed
  3. Store embeddings in a vector database (Pinecone, Weaviate, pgvector, Qdrant)
  4. At query time, embed the user question and retrieve the top-k most similar chunks
  5. Pass retrieved chunks as context in the system prompt to mistral-large-latest
# Simplified RAG query
context_chunks = vector_db.search(
    query_embedding, top_k=5
)
context = "\n\n".join(
    [chunk.text for chunk in context_chunks]
)

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {
            "role": "system",
            "content": "Answer based on the "
                "following context. If the "
                "context does not contain the "
                "answer, say so.\n\n"
                f"Context:\n{context}"
        },
        {"role": "user",
         "content": user_question}
    ]
)

Verification: Check that your embeddings return a consistent number of dimensions. Store a few test embeddings and verify that semantically similar inputs produce higher cosine similarity scores than unrelated inputs.

Step 7: Use Codestral for Code Completion

Codestral is Mistral's specialized code model with fill-in-the-middle (FIM) capability, meaning it can complete code given both the prefix (code before the cursor) and suffix (code after the cursor). This is what powers IDE integrations.

FIM request

response = client.fim.complete(
    model="codestral-latest",
    prompt="def calculate_area(shape, dims):\n"
           "    if shape == 'circle':\n"
           "        ",
    suffix="\n    elif shape == 'rectangle':\n"
           "        return dims['width'] "
           "* dims['height']"
)

print(response.choices[0].message.content)
# Expected: return 3.14159
#   * dims['radius'] ** 2

IDE integration

Codestral integrates with VS Code via the Continue extension and with JetBrains IDEs. Set up your API key in the extension settings and point it to codestral-latest for inline completions.

When to use Codestral vs. Devstral

  • Codestral (codestral-latest): Best for real-time code completion, FIM, and inline suggestions. Optimized for speed.
  • Devstral 2 (devstral-latest): Best for agentic coding tasks, multi-file refactoring, and software engineering workflows that require planning and reasoning. According to Mistral AI, Devstral 2 is a "frontier code agents model for solving software engineering tasks."

Verification: Send a FIM request with a known code pattern and verify the completion is syntactically valid. Test with multiple languages (Python, JavaScript, Rust) to confirm polyglot support.

Step 8: Deploy to Production

Moving from prototype to production requires attention to error handling, rate limiting, cost control, and observability.

Error handling

from mistralai import Mistral

client = Mistral(
    api_key=os.environ["MISTRAL_API_KEY"]
)

try:
    response = client.chat.complete(
        model="mistral-large-latest",
        messages=[
            {"role": "user",
             "content": prompt}
        ],
        max_tokens=2048
    )
    return response.choices[0].message.content
except Exception as e:
    if "429" in str(e):
        # Rate limited: exponential backoff
        time.sleep(2 ** retry_count)
    elif "401" in str(e):
        # Authentication failure
        log.critical("Invalid API key")
    else:
        raise

Batch processing

For high-volume workloads (bulk classification, dataset enrichment), use the Batch API to submit requests in bulk:

batch = client.batch.create(
    model="mistral-small-latest",
    requests=[
        {
            "custom_id": f"req-{i}",
            "body": {
                "messages": [
                    {"role": "user",
                     "content": text}
                ]
            }
        }
        for i, text in enumerate(texts)
    ]
)

Cost optimization checklist

  • Use mistral-small-latest for classification, extraction, and simple generation
  • Reserve mistral-large-latest for complex reasoning and multimodal tasks
  • Enable response caching for repeated queries
  • Set max_tokens to the minimum needed (do not leave it at defaults)
  • Use batch processing for non-real-time workloads
  • Monitor usage via the La Plateforme console dashboard

Content moderation

Use mistral-moderation-latest to pre-screen user inputs or post-screen model outputs:

moderation = client.classifiers.moderate_chat(
    model="mistral-moderation-latest",
    inputs=[
        {"role": "user",
         "content": user_input}
    ]
)
# Check moderation results before processing

Verification: Deploy to a staging environment and run 100 test requests across different models. Monitor response times, error rates, and token consumption. Set up alerts for 429 (rate limit) and 500 (server error) responses.


Troubleshooting and FAQ

Common Questions
I get a 401 Unauthorized error. What is wrong? +
Your API key is invalid or not set. Verify with echo $MISTRAL_API_KEY (macOS/Linux) or $env:MISTRAL_API_KEY (PowerShell). Generate a new key at console.mistral.ai if needed. Keys are shown only once at creation time.
The model returns empty or truncated responses. What happened? +
Increase max_tokens. The default may be too low for your use case. Also check that your input is not exceeding the model's context window. If streaming, ensure your client is reading all chunks before closing the connection.
Function calling is not triggering. How do I fix it? +
Ensure tool_choice is set to "auto" or "any". Verify your tool schema has valid JSON with type, description, and parameters fields. Some models handle tools more reliably than others: mistral-large-latest is the most reliable for function calling.
JSON output includes markdown formatting. How do I prevent that? +
Add explicit instructions to your system prompt: "Respond with raw JSON only. Do not wrap in code fences or add text outside the JSON object." Using response_format={"type": "json_object"} also helps enforce clean JSON output.
Which model should I start with for prototyping? +
mistral-small-latest for cost-sensitive prototyping. mistral-large-latest for production quality. codestral-latest for code-specific tasks. ministral-8b-latest for edge deployment where you need to self-host.
Can I self-host Mistral models instead of using the API? +
Yes. Open-weight models (Mistral Large 3, Mistral Small 4, Mistral Medium 3.5, Ministral 3, Devstral 2) are available under Apache 2.0. Deploy via vLLM, Hugging Face TGI, or Ollama on your own GPU infrastructure. Premier models (Codestral, OCR 3, Moderation 2) are API-only.

Next Step

Build a function-calling agent. Pick a real API your team uses (Jira, Slack, your internal database), define it as a tool, and wire it into a Mistral chat loop. This exercise demonstrates the full integration pattern and helps you evaluate whether Mistral's function calling reliability meets your production requirements.


Before You Use AI
Your Privacy

AI API providers process your inputs on remote servers. Review Mistral's data processing agreement and privacy policy at mistral.ai/terms before sending sensitive data through the API. Enterprise customers should evaluate data residency options (Mistral offers EU-hosted infrastructure).

Open-weight models deployed on your own infrastructure keep all data local, which may be required for regulated industries.

Mental Health & AI Dependency

AI tools are productivity aids, not replacements for human judgment. If you or someone you know is in crisis:

988 Suicide & Crisis Lifeline: Call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: Text HOME to 741741

See the NIST AI Risk Management Framework for organizational AI governance guidance.

Your Rights & Our Transparency

Under GDPR and CCPA, you have rights to access, correct, and delete your data. Mistral AI is headquartered in Paris, France and operates under EU data protection regulations.

TechJack Solutions maintains editorial independence. This article was not sponsored or reviewed by Mistral AI. For AI regulation context, see our EU AI Act overview.