Gemini API

How to Use the Google Gemini API

The Gemini API processed 85 billion requests in January 2026 alone. If you haven't integrated it yet, you're leaving a $0.10-per-million-token option on the table (2.5 Flash-Lite; the next-gen 3.1 Flash-Lite runs $0.25/1M input) while paying 10x more elsewhere. After completing this guide, you'll have a working integration making text and multimodal requests through the Google Gen AI SDK. Estimated time: 15-20 minutes.

15-20 min Practitioner Python (SDK)

85B

API Requests / Month

FatJoe, Jan 2026

2.4M

Active API Developers

FatJoe

$0.10

Lowest / 1M Input Tokens

Google AI Pricing

Token Context Window

Google AI Docs

64K

Max Output Tokens

Google AI Docs

Quick Start

Ten lines. Install the SDK, set your key, make a request.

# pip install -U google-genai
from google import genai
import os

# Set GEMINI_API_KEY env var or pass directly
client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain the CAP theorem in 3 sentences."
)

print(response.text)

Set your API key as an environment variable before running:

# Linux/macOS
export GEMINI_API_KEY="AIza..."

# Windows PowerShell
$env:GEMINI_API_KEY = "AIza..."

# Windows CMD
set GEMINI_API_KEY=AIza...

Get your key from Google AI Studio. No credit card required for free tier. Key starts with AIza -- copy it immediately, the UI won't show it again without regenerating. One key per project, multiple projects allowed.

Prerequisites

Prerequisites Checklist

Google Account

Any Gmail or Workspace account. Check yours

Required
Python 3.9+

Run python --version. Download

Required
pip 23.0+

Upgrade: pip install --upgrade pip

Required
Terminal Access

Bash, zsh, or PowerShell

Required
Text Editor or IDE

VS Code, PyCharm, or whatever you write Python in

Recommended
Google Cloud Billing

Only for production beyond free tier. Set up

Recommended

0 / 6 complete

SDK Setup and Core Requests

The unified google-genai package replaced the deprecated google-generativeai. It works with both AI Studio and Vertex AI. The SDK supports Python, Node.js, and Go. REST is also available for any language.

pip install -U google-genai

Verification: Run python -c "from google import genai; print(genai.__version__)". If you see a version number (0.9.x or higher as of March 2026), the install succeeded.

Common gotcha: If you get ModuleNotFoundError: No module named 'google.genai', run pip uninstall google-generativeai first. The old and new packages conflict.

Text Request

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the three main types of machine learning?"
)

print(response.text)

Verification: You should see a text response listing supervised, unsupervised, and reinforcement learning. If you get a response, the API key, SDK, and network connection all work.

Multimodal Request (Image)

from google import genai
from google.genai import types

client = genai.Client()
image = client.files.upload(file="screenshot.png")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_uri(
            file_uri=image.uri,
            mime_type=image.mime_type
        ),
        "Describe what you see in this image and identify any UI components."
    ]
)

Supports image/png, image/jpeg, image/webp, audio/mp3, audio/wav, video/mp4, and application/pdf.

Structured Output (JSON)

from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    sentiment: str
    key_themes: list[str]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Review the movie Inception",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=MovieReview
    )
)

Always pair response_mime_type with response_schema. Without the schema, the model guesses at structure. Structured output docs.

Search Grounding

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What were the major AI announcements this week?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)

# Access source citations
if response.candidates[0].grounding_metadata:
    for chunk in response.candidates[0].grounding_metadata.grounding_chunks:
        print(f"Source: {chunk.web.title} -- {chunk.web.uri}")

Grounding connects the model to live web data. Free tier: 5,000 grounded prompts/month. Paid: $14 per 1,000 queries. This is where the API pulls ahead for RAG workflows -- built-in search skips the vector database for many retrieval tasks. For production agent workflows, see our guide on Gemini agents and agentic capabilities.

Key Capabilities

Multimodal Input

Text, images, audio, video, and PDFs in a single request. All models.

Search Grounding

Live Google Search results injected into model responses with source citations.

Function Calling

Model calls your functions, inspects results, and chains multi-step actions.

Structured Output

Enforce JSON schemas on responses via Pydantic models. No post-processing.

Code Execution

Model generates and runs Python code in a sandboxed environment, returns output.

Thinking Mode

Extended reasoning with controllable depth: MINIMAL, LOW, MEDIUM, HIGH via thinking_level.

Thought Signatures

New in 3.1: Preserves reasoning context across multi-turn conversations for agentic workflows.

Media Resolution Control

New in 3.1: media_resolution parameter controls how multimodal inputs are tokenized to reduce costs.

Model Selection

Pick your model based on cost tolerance and reasoning requirements. All models share 1M context window and 64K output token ceiling.

Pro

Gemini 3.1 Pro

Complex reasoning, agentic tasks. Preview.

Input$2.00 (≤200K) / $4.00 (>200K)
Output$12.00 (≤200K) / $18.00 (>200K)
Context1M
Free TierNO FREE TIER

Pro

Gemini 2.5 Pro

Production reasoning, deep analysis. Stable.

Input$1.25 (≤200K) / $2.50 (>200K)
Output$10.00 (≤200K) / $15.00 (>200K)
Context1M

Best for Most Flash

Gemini 2.5 Flash

Default workhorse. Best price-performance. Stable.

Input$0.30
Output$2.50
Context1M

Lite

Gemini 2.5 Flash-Lite

High-volume, budget workloads. Stable.

Input$0.10
Output$0.40
Context1M

Flash

Gemini 3 Flash

High-performance mid-range. Preview.

Input$0.50
Output$3.00
Context1M

Lite

Gemini 3.1 Flash-Lite

Next-gen budget option. Preview.

Input$0.25
Output$1.50
Context1M

Flash

Gemini 3.1 Flash Live

Low-latency audio-to-audio. Preview.

Input (text)$0.75
Input (audio)$3.00
Output (text)$4.50
Output (audio)$12.00

Pin stable models in production. Preview models (3.1 Pro, 3 Flash) can change behavior between updates. Google deprecated Gemini 3 Pro Preview on March 9, 2026 with minimal notice. Use versioned model IDs like gemini-2.5-flash-001 for customer-facing apps.

Decision Framework

Prototyping and testing? Use Gemini 2.5 Flash on free tier. No cost, 10 RPM.
Production API with cost sensitivity? Gemini 2.5 Flash-Lite at $0.10/1M input. Handles classification, extraction, and summarization.
Production API needing reasoning? Gemini 2.5 Pro (stable) or 3.1 Pro (preview, higher capability but subject to breaking changes).
Batch processing at scale? Any model with the Batch API. 50% savings, 24-hour SLA.

Prices per 1M tokens. Checked March 26, 2026. Source: Google AI Pricing

Pricing and Cost Optimization

Free tier works for prototyping. Here's what production actually costs.

Cost Lever	Detail	Savings
Batch API	Async processing, 24-hour SLA. All models.	50%
Context Caching	Cache large documents, query repeatedly. $0.025/1M cached tokens + $1.00/1M tokens/hr storage.	Up to 75%
Flash-Lite	$0.10 input / $0.40 output. Classification, extraction, summarization.	vs Pro: ~90%
Thinking Budget	Set thinking_level to MINIMAL or LOW. Thinking tokens count as output.	Variable
Media Resolution	Use media_resolution parameter (3.1) to reduce token costs on image/video-heavy requests.	Variable

Thinking tokens are an additional cost consideration. When a model uses its "thinking" capability (Gemini 2.5 Pro and 3.1 Pro), the internal reasoning tokens count toward your output token usage. Control this with thinking_level: MINIMAL, LOW, MEDIUM, or HIGH. (Note: thinking_level replaces the older thinking_budget parameter as of March 2026.)

Paid Tier Progression

Tier	Requirement	Monthly Cap
Tier 1	Active billing account	$250
Tier 2	$100+ spent, 3+ days since first payment	$2,000
Tier 3	$1,000+ spent, 30+ days since first payment	$20,000+

Quick math: A 2,000-token prompt + 1,000-token response on 2.5 Flash costs ~$0.003. Run that 100K times = $310/month. Same volume on Flash-Lite = $42/month.

Note: On the consumer side (not API), Google offers Free (30 prompts/day on Gemini 3 Flash), Plus (128K context, expanded limits), Pro ($19.99/mo), and Ultra ($249.99/mo, often $124.99/mo for the first 3 months) plans. These consumer plans are separate from API pricing.

AI Studio vs Vertex AI

Same models, same SDK, different infrastructure layer.

AI Studio

Use this when...

Prototyping or personal projects
Startup without SOC 2 requirements
API key auth is sufficient
Free tier access needed

Open AI Studio

Vertex AI

Use this when...

Data residency or VPC controls required
IAM-based access and audit logging
Custom model training or Model Garden
Enterprise compliance (SOC 2, HIPAA)

Vertex AI docs

Migration is straightforward -- change client initialization and auth method. Prompts and code structure stay the same. Migration guide. For more context on AI platform choices and governance requirements, the platform decision often depends on your organization's compliance posture more than technical capability.

Rate Limits

Limits are per-project, not per-key. Three dimensions tracked simultaneously -- exceeding any one triggers a 429 error.

Free Tier (No Credit Card)

Model	RPM	Requests/Day	TPM
2.5 Pro	5	100	250K
2.5 Flash	10	250	250K
2.5 Flash-Lite	15	1,000	250K
3 Flash	10	250	250K
3.1 Flash-Lite	15	1,000	250K

No free tier: Gemini 3.1 Pro requires a paid billing account for API access. Gemini 2.5 Pro has a limited free tier (5 RPM, 100 RPD) but is the most restricted of the free options.

December 2025 quota cut: Google reduced free tier limits by 50-80%. If you're following older tutorials, those numbers are stale. Current limits.

Source: Google AI Rate Limits, verified March 2026

Retry Pattern

import time
from google import genai
from google.api_core import exceptions

client = genai.Client()

def call_with_retry(prompt, model="gemini-2.5-flash", max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.models.generate_content(
                model=model, contents=prompt
            )
            return response.text
        except exceptions.ResourceExhausted:
            wait = 2 ** attempt
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
    raise Exception("Max retries exceeded")

View your current limits in AI Studio's quota dashboard.

Limitations

Preview instability. Gemini 3 Pro Preview was deprecated March 9, 2026 with minimal migration window. Pin to stable releases for production.
Free tier data usage. Google may use free tier prompts/responses to improve products. Paid tier excludes this. Terms.
Hallucination. All LLMs generate plausible-sounding incorrect output. Grounding reduces this but doesn't eliminate it.
Region availability. Some models and features aren't available in all regions. Check here.
Output cap. 64K output tokens per response (about 48,000 words). Implement chunking for longer generation.

Troubleshooting

Cause: Key from a disabled project, or Generative Language API isn't enabled.

Fix: Generate a new key in a fresh project via AI Studio. The API enables automatically. Source: Google AI Quickstart.

Cause: December 2025 quota reduction. Flash is now 10 RPM / 250 RPD.

Fix: Add delays between calls, use Flash-Lite (15 RPM / 1,000 RPD), or enable billing for Tier 1 ($250/month cap). Current limits.

Cause: AI Studio defaults to specific temperature and safety settings that differ from SDK defaults.

Fix: Set temperature, top_p, and safety_settings explicitly in GenerateContentConfig. Config docs.

Cause: Using response_mime_type without response_schema.

Fix: Always pair both. Pass a Pydantic model or dict as the schema. Structured output docs.

Cause: Deprecated google-generativeai package installed.

Fix: pip uninstall google-generativeai && pip install -U google-genai. Deprecation notice.

What's Next

Build a function-calling agent. The API's tool use system lets the model call your functions, inspect results, and chain actions. Combined with grounding and structured output, you have the pieces for a production agentic AI workflow.

New in 3.1: Thought Signatures preserve reasoning context across multi-turn conversations, improving coherence in function-calling and agentic workflows. The media_resolution parameter lets you control how multimodal inputs are tokenized. Gemini 3.1 Flash Live is a new low-latency audio-to-audio model for real-time conversational applications.

Gallery

Contacts

How to Use the Google Gemini API

Quick Start

Prerequisites

SDK Setup and Core Requests

Text Request

Multimodal Request (Image)

Structured Output (JSON)

Search Grounding

Key Capabilities

Model Selection

Decision Framework

Pricing and Cost Optimization

Paid Tier Progression

AI Studio vs Vertex AI

Rate Limits

Free Tier (No Credit Card)

Retry Pattern

Limitations

Troubleshooting

What's Next

Services

Learn

Company