Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Gemini API

How to Use the Google Gemini API

The Gemini API processed 85 billion requests in January 2026 alone. If you haven't integrated it yet, you're leaving a $0.10-per-million-token option on the table (2.5 Flash-Lite; the next-gen 3.1 Flash-Lite runs $0.25/1M input) while paying 10x more elsewhere. After completing this guide, you'll have a working integration making text and multimodal requests through the Google Gen AI SDK. Estimated time: 15-20 minutes.

15-20 min Practitioner Python (SDK)

85B
API Requests / Month
2.4M
Active API Developers
$0.10
Lowest / 1M Input Tokens
1M
Token Context Window
64K
Max Output Tokens

Quick Start

Ten lines. Install the SDK, set your key, make a request.

Python
# pip install -U google-genai
from google import genai
import os

# Set GEMINI_API_KEY env var or pass directly
client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain the CAP theorem in 3 sentences."
)

print(response.text)

Set your API key as an environment variable before running:

Bash
# Linux/macOS
export GEMINI_API_KEY="AIza..."

# Windows PowerShell
$env:GEMINI_API_KEY = "AIza..."

# Windows CMD
set GEMINI_API_KEY=AIza...

Get your key from Google AI Studio. No credit card required for free tier. Key starts with AIza -- copy it immediately, the UI won't show it again without regenerating. One key per project, multiple projects allowed.


Prerequisites

Prerequisites Checklist
  • Google Account
    Any Gmail or Workspace account. Check yours
    Required
  • Python 3.9+
    Run python --version. Download
    Required
  • pip 23.0+
    Upgrade: pip install --upgrade pip
    Required
  • Terminal Access
    Bash, zsh, or PowerShell
    Required
  • Text Editor or IDE
    VS Code, PyCharm, or whatever you write Python in
    Recommended
  • Google Cloud Billing
    Only for production beyond free tier. Set up
    Recommended
0 / 6 complete

SDK Setup and Core Requests

The unified google-genai package replaced the deprecated google-generativeai. It works with both AI Studio and Vertex AI. The SDK supports Python, Node.js, and Go. REST is also available for any language.

Bash
pip install -U google-genai

Verification: Run python -c "from google import genai; print(genai.__version__)". If you see a version number (0.9.x or higher as of March 2026), the install succeeded.

Common gotcha: If you get ModuleNotFoundError: No module named 'google.genai', run pip uninstall google-generativeai first. The old and new packages conflict.

Text Request

Python
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the three main types of machine learning?"
)

print(response.text)

Verification: You should see a text response listing supervised, unsupervised, and reinforcement learning. If you get a response, the API key, SDK, and network connection all work.

Multimodal Request (Image)

Python
from google import genai
from google.genai import types

client = genai.Client()
image = client.files.upload(file="screenshot.png")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_uri(
            file_uri=image.uri,
            mime_type=image.mime_type
        ),
        "Describe what you see in this image and identify any UI components."
    ]
)

Supports image/png, image/jpeg, image/webp, audio/mp3, audio/wav, video/mp4, and application/pdf.

Structured Output (JSON)

Python
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    sentiment: str
    key_themes: list[str]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Review the movie Inception",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=MovieReview
    )
)

Always pair response_mime_type with response_schema. Without the schema, the model guesses at structure. Structured output docs.

Search Grounding

Python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What were the major AI announcements this week?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)

# Access source citations
if response.candidates[0].grounding_metadata:
    for chunk in response.candidates[0].grounding_metadata.grounding_chunks:
        print(f"Source: {chunk.web.title} -- {chunk.web.uri}")

Grounding connects the model to live web data. Free tier: 5,000 grounded prompts/month. Paid: $14 per 1,000 queries. This is where the API pulls ahead for RAG workflows -- built-in search skips the vector database for many retrieval tasks. For production agent workflows, see our guide on Gemini agents and agentic capabilities.


Key Capabilities

Multimodal Input
Text, images, audio, video, and PDFs in a single request. All models.
Search Grounding
Live Google Search results injected into model responses with source citations.
Function Calling
Model calls your functions, inspects results, and chains multi-step actions.
Structured Output
Enforce JSON schemas on responses via Pydantic models. No post-processing.
Code Execution
Model generates and runs Python code in a sandboxed environment, returns output.
Thinking Mode
Extended reasoning with controllable depth: MINIMAL, LOW, MEDIUM, HIGH via thinking_level.
Thought Signatures
New in 3.1: Preserves reasoning context across multi-turn conversations for agentic workflows.
Media Resolution Control
New in 3.1: media_resolution parameter controls how multimodal inputs are tokenized to reduce costs.

Model Selection

Pick your model based on cost tolerance and reasoning requirements. All models share 1M context window and 64K output token ceiling.

Pro
Gemini 3.1 Pro
Complex reasoning, agentic tasks. Preview.
  • Input$2.00 (≤200K) / $4.00 (>200K)
  • Output$12.00 (≤200K) / $18.00 (>200K)
  • Context1M
  • Free TierNO FREE TIER
Pro
Gemini 2.5 Pro
Production reasoning, deep analysis. Stable.
  • Input$1.25 (≤200K) / $2.50 (>200K)
  • Output$10.00 (≤200K) / $15.00 (>200K)
  • Context1M
Lite
Gemini 2.5 Flash-Lite
High-volume, budget workloads. Stable.
  • Input$0.10
  • Output$0.40
  • Context1M
Flash
Gemini 3 Flash
High-performance mid-range. Preview.
  • Input$0.50
  • Output$3.00
  • Context1M
Lite
Gemini 3.1 Flash-Lite
Next-gen budget option. Preview.
  • Input$0.25
  • Output$1.50
  • Context1M
Flash
Gemini 3.1 Flash Live
Low-latency audio-to-audio. Preview.
  • Input (text)$0.75
  • Input (audio)$3.00
  • Output (text)$4.50
  • Output (audio)$12.00

Pin stable models in production. Preview models (3.1 Pro, 3 Flash) can change behavior between updates. Google deprecated Gemini 3 Pro Preview on March 9, 2026 with minimal notice. Use versioned model IDs like gemini-2.5-flash-001 for customer-facing apps.

Decision Framework

  • Prototyping and testing? Use Gemini 2.5 Flash on free tier. No cost, 10 RPM.
  • Production API with cost sensitivity? Gemini 2.5 Flash-Lite at $0.10/1M input. Handles classification, extraction, and summarization.
  • Production API needing reasoning? Gemini 2.5 Pro (stable) or 3.1 Pro (preview, higher capability but subject to breaking changes).
  • Batch processing at scale? Any model with the Batch API. 50% savings, 24-hour SLA.
Prices per 1M tokens. Checked March 26, 2026. Source: Google AI Pricing

Pricing and Cost Optimization

Free tier works for prototyping. Here's what production actually costs.

Cost Lever Detail Savings
Batch API Async processing, 24-hour SLA. All models. 50%
Context Caching Cache large documents, query repeatedly. $0.025/1M cached tokens + $1.00/1M tokens/hr storage. Up to 75%
Flash-Lite $0.10 input / $0.40 output. Classification, extraction, summarization. vs Pro: ~90%
Thinking Budget Set thinking_level to MINIMAL or LOW. Thinking tokens count as output. Variable
Media Resolution Use media_resolution parameter (3.1) to reduce token costs on image/video-heavy requests. Variable

Thinking tokens are an additional cost consideration. When a model uses its "thinking" capability (Gemini 2.5 Pro and 3.1 Pro), the internal reasoning tokens count toward your output token usage. Control this with thinking_level: MINIMAL, LOW, MEDIUM, or HIGH. (Note: thinking_level replaces the older thinking_budget parameter as of March 2026.)

Paid Tier Progression

Tier Requirement Monthly Cap
Tier 1 Active billing account $250
Tier 2 $100+ spent, 3+ days since first payment $2,000
Tier 3 $1,000+ spent, 30+ days since first payment $20,000+

Quick math: A 2,000-token prompt + 1,000-token response on 2.5 Flash costs ~$0.003. Run that 100K times = $310/month. Same volume on Flash-Lite = $42/month.

Note: On the consumer side (not API), Google offers Free (30 prompts/day on Gemini 3 Flash), Plus (128K context, expanded limits), Pro ($19.99/mo), and Ultra ($249.99/mo, often $124.99/mo for the first 3 months) plans. These consumer plans are separate from API pricing.

AI Studio vs Vertex AI

Same models, same SDK, different infrastructure layer.

AI Studio
Use this when...
  • Prototyping or personal projects
  • Startup without SOC 2 requirements
  • API key auth is sufficient
  • Free tier access needed
Open AI Studio
Vertex AI
Use this when...
  • Data residency or VPC controls required
  • IAM-based access and audit logging
  • Custom model training or Model Garden
  • Enterprise compliance (SOC 2, HIPAA)
Vertex AI docs

Migration is straightforward -- change client initialization and auth method. Prompts and code structure stay the same. Migration guide. For more context on AI platform choices and governance requirements, the platform decision often depends on your organization's compliance posture more than technical capability.


Rate Limits

Limits are per-project, not per-key. Three dimensions tracked simultaneously -- exceeding any one triggers a 429 error.

Free Tier (No Credit Card)

Model RPM Requests/Day TPM
2.5 Pro 5 100 250K
2.5 Flash 10 250 250K
2.5 Flash-Lite 15 1,000 250K
3 Flash 10 250 250K
3.1 Flash-Lite 15 1,000 250K

No free tier: Gemini 3.1 Pro requires a paid billing account for API access. Gemini 2.5 Pro has a limited free tier (5 RPM, 100 RPD) but is the most restricted of the free options.

December 2025 quota cut: Google reduced free tier limits by 50-80%. If you're following older tutorials, those numbers are stale. Current limits.

Source: Google AI Rate Limits, verified March 2026

Retry Pattern

Python
import time
from google import genai
from google.api_core import exceptions

client = genai.Client()

def call_with_retry(prompt, model="gemini-2.5-flash", max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.models.generate_content(
                model=model, contents=prompt
            )
            return response.text
        except exceptions.ResourceExhausted:
            wait = 2 ** attempt
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
    raise Exception("Max retries exceeded")

View your current limits in AI Studio's quota dashboard.


Limitations

  • Preview instability. Gemini 3 Pro Preview was deprecated March 9, 2026 with minimal migration window. Pin to stable releases for production.
  • Free tier data usage. Google may use free tier prompts/responses to improve products. Paid tier excludes this. Terms.
  • Hallucination. All LLMs generate plausible-sounding incorrect output. Grounding reduces this but doesn't eliminate it.
  • Region availability. Some models and features aren't available in all regions. Check here.
  • Output cap. 64K output tokens per response (about 48,000 words). Implement chunking for longer generation.

Troubleshooting

Cause: Key from a disabled project, or Generative Language API isn't enabled.

Fix: Generate a new key in a fresh project via AI Studio. The API enables automatically. Source: Google AI Quickstart.

Cause: December 2025 quota reduction. Flash is now 10 RPM / 250 RPD.

Fix: Add delays between calls, use Flash-Lite (15 RPM / 1,000 RPD), or enable billing for Tier 1 ($250/month cap). Current limits.

Cause: AI Studio defaults to specific temperature and safety settings that differ from SDK defaults.

Fix: Set temperature, top_p, and safety_settings explicitly in GenerateContentConfig. Config docs.

Cause: Using response_mime_type without response_schema.

Fix: Always pair both. Pass a Pydantic model or dict as the schema. Structured output docs.

Cause: Deprecated google-generativeai package installed.

Fix: pip uninstall google-generativeai && pip install -U google-genai. Deprecation notice.


What's Next

Build a function-calling agent. The API's tool use system lets the model call your functions, inspect results, and chain actions. Combined with grounding and structured output, you have the pieces for a production agentic AI workflow.

New in 3.1: Thought Signatures preserve reasoning context across multi-turn conversations, improving coherence in function-calling and agentic workflows. The media_resolution parameter lets you control how multimodal inputs are tokenized. Gemini 3.1 Flash Live is a new low-latency audio-to-audio model for real-time conversational applications.

Data verified: March 26, 2026 | Source

Google, Gemini, Google AI Studio, and Vertex AI are trademarks of Google LLC. This article is not affiliated with, sponsored by, or endorsed by Google LLC.

Before You Use AI
Your Privacy

The Gemini API processes your prompts on Google's servers. Free tier data may be used to improve Google products. Paid tier offers separate data processing agreements. Review Google's data retention policies before sending sensitive information.

Mental Health & AI Dependency

AI tools are not substitutes for human judgment or professional advice. If you or someone you know is struggling:

  • 988 Suicide & Crisis Lifeline -- Call or text 988 (US)
  • Crisis Text Line -- Text HOME to 741741
Your Rights & Our Transparency

You have the right to know how AI content is created and to delete your data. Under GDPR and CCPA, you can request deletion of personal data processed by AI services.

TechJack Solutions is editorially independent and is not affiliated with, sponsored by, or endorsed by Google LLC. This article may contain affiliate links -- see our disclosure policy.