How to Use the Google Gemini API
The Gemini API processed 85 billion requests in January 2026 alone. If you haven't integrated it yet, you're leaving a $0.10-per-million-token option on the table (2.5 Flash-Lite; the next-gen 3.1 Flash-Lite runs $0.25/1M input) while paying 10x more elsewhere. After completing this guide, you'll have a working integration making text and multimodal requests through the Google Gen AI SDK. Estimated time: 15-20 minutes.
Quick Start
Ten lines. Install the SDK, set your key, make a request.
# pip install -U google-genai from google import genai import os # Set GEMINI_API_KEY env var or pass directly client = genai.Client() response = client.models.generate_content( model="gemini-2.5-flash", contents="Explain the CAP theorem in 3 sentences." ) print(response.text)
Set your API key as an environment variable before running:
# Linux/macOS export GEMINI_API_KEY="AIza..." # Windows PowerShell $env:GEMINI_API_KEY = "AIza..." # Windows CMD set GEMINI_API_KEY=AIza...
Get your key from Google AI Studio. No credit card required for free tier. Key starts with AIza -- copy it immediately, the UI won't show it again without regenerating. One key per project, multiple projects allowed.
Prerequisites
-
RequiredGoogle AccountAny Gmail or Workspace account. Check yours
-
RequiredPython 3.9+Run
python --version. Download -
Requiredpip 23.0+Upgrade:
pip install --upgrade pip -
RequiredTerminal AccessBash, zsh, or PowerShell
-
RecommendedText Editor or IDEVS Code, PyCharm, or whatever you write Python in
-
RecommendedGoogle Cloud BillingOnly for production beyond free tier. Set up
SDK Setup and Core Requests
The unified google-genai package replaced the deprecated google-generativeai. It works with both AI Studio and Vertex AI. The SDK supports Python, Node.js, and Go. REST is also available for any language.
pip install -U google-genai
Verification: Run python -c "from google import genai; print(genai.__version__)". If you see a version number (0.9.x or higher as of March 2026), the install succeeded.
Common gotcha: If you get ModuleNotFoundError: No module named 'google.genai', run pip uninstall google-generativeai first. The old and new packages conflict.
Text Request
from google import genai client = genai.Client() response = client.models.generate_content( model="gemini-2.5-flash", contents="What are the three main types of machine learning?" ) print(response.text)
Verification: You should see a text response listing supervised, unsupervised, and reinforcement learning. If you get a response, the API key, SDK, and network connection all work.
Multimodal Request (Image)
from google import genai from google.genai import types client = genai.Client() image = client.files.upload(file="screenshot.png") response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=image.uri, mime_type=image.mime_type ), "Describe what you see in this image and identify any UI components." ] )
Supports image/png, image/jpeg, image/webp, audio/mp3, audio/wav, video/mp4, and application/pdf.
Structured Output (JSON)
from pydantic import BaseModel class MovieReview(BaseModel): title: str rating: float sentiment: str key_themes: list[str] response = client.models.generate_content( model="gemini-2.5-flash", contents="Review the movie Inception", config=types.GenerateContentConfig( response_mime_type="application/json", response_schema=MovieReview ) )
Always pair response_mime_type with response_schema. Without the schema, the model guesses at structure. Structured output docs.
Search Grounding
response = client.models.generate_content( model="gemini-2.5-flash", contents="What were the major AI announcements this week?", config=types.GenerateContentConfig( tools=[types.Tool(google_search=types.GoogleSearch())] ) ) # Access source citations if response.candidates[0].grounding_metadata: for chunk in response.candidates[0].grounding_metadata.grounding_chunks: print(f"Source: {chunk.web.title} -- {chunk.web.uri}")
Grounding connects the model to live web data. Free tier: 5,000 grounded prompts/month. Paid: $14 per 1,000 queries. This is where the API pulls ahead for RAG workflows -- built-in search skips the vector database for many retrieval tasks. For production agent workflows, see our guide on Gemini agents and agentic capabilities.
Key Capabilities
Model Selection
Pick your model based on cost tolerance and reasoning requirements. All models share 1M context window and 64K output token ceiling.
- Input$2.00 (≤200K) / $4.00 (>200K)
- Output$12.00 (≤200K) / $18.00 (>200K)
- Context1M
- Free TierNO FREE TIER
- Input$1.25 (≤200K) / $2.50 (>200K)
- Output$10.00 (≤200K) / $15.00 (>200K)
- Context1M
- Input$0.30
- Output$2.50
- Context1M
- Input$0.10
- Output$0.40
- Context1M
- Input$0.50
- Output$3.00
- Context1M
- Input$0.25
- Output$1.50
- Context1M
- Input (text)$0.75
- Input (audio)$3.00
- Output (text)$4.50
- Output (audio)$12.00
Pin stable models in production. Preview models (3.1 Pro, 3 Flash) can change behavior between updates. Google deprecated Gemini 3 Pro Preview on March 9, 2026 with minimal notice. Use versioned model IDs like gemini-2.5-flash-001 for customer-facing apps.
Decision Framework
- Prototyping and testing? Use Gemini 2.5 Flash on free tier. No cost, 10 RPM.
- Production API with cost sensitivity? Gemini 2.5 Flash-Lite at $0.10/1M input. Handles classification, extraction, and summarization.
- Production API needing reasoning? Gemini 2.5 Pro (stable) or 3.1 Pro (preview, higher capability but subject to breaking changes).
- Batch processing at scale? Any model with the Batch API. 50% savings, 24-hour SLA.
Pricing and Cost Optimization
Free tier works for prototyping. Here's what production actually costs.
| Cost Lever | Detail | Savings |
|---|---|---|
| Batch API | Async processing, 24-hour SLA. All models. | 50% |
| Context Caching | Cache large documents, query repeatedly. $0.025/1M cached tokens + $1.00/1M tokens/hr storage. | Up to 75% |
| Flash-Lite | $0.10 input / $0.40 output. Classification, extraction, summarization. | vs Pro: ~90% |
| Thinking Budget | Set thinking_level to MINIMAL or LOW. Thinking tokens count as output. | Variable |
| Media Resolution | Use media_resolution parameter (3.1) to reduce token costs on image/video-heavy requests. | Variable |
Thinking tokens are an additional cost consideration. When a model uses its "thinking" capability (Gemini 2.5 Pro and 3.1 Pro), the internal reasoning tokens count toward your output token usage. Control this with thinking_level: MINIMAL, LOW, MEDIUM, or HIGH. (Note: thinking_level replaces the older thinking_budget parameter as of March 2026.)
Paid Tier Progression
| Tier | Requirement | Monthly Cap |
|---|---|---|
| Tier 1 | Active billing account | $250 |
| Tier 2 | $100+ spent, 3+ days since first payment | $2,000 |
| Tier 3 | $1,000+ spent, 30+ days since first payment | $20,000+ |
Quick math: A 2,000-token prompt + 1,000-token response on 2.5 Flash costs ~$0.003. Run that 100K times = $310/month. Same volume on Flash-Lite = $42/month.
Note: On the consumer side (not API), Google offers Free (30 prompts/day on Gemini 3 Flash), Plus (128K context, expanded limits), Pro ($19.99/mo), and Ultra ($249.99/mo, often $124.99/mo for the first 3 months) plans. These consumer plans are separate from API pricing.
AI Studio vs Vertex AI
Same models, same SDK, different infrastructure layer.
- Prototyping or personal projects
- Startup without SOC 2 requirements
- API key auth is sufficient
- Free tier access needed
- Data residency or VPC controls required
- IAM-based access and audit logging
- Custom model training or Model Garden
- Enterprise compliance (SOC 2, HIPAA)
Migration is straightforward -- change client initialization and auth method. Prompts and code structure stay the same. Migration guide. For more context on AI platform choices and governance requirements, the platform decision often depends on your organization's compliance posture more than technical capability.
Rate Limits
Limits are per-project, not per-key. Three dimensions tracked simultaneously -- exceeding any one triggers a 429 error.
Free Tier (No Credit Card)
| Model | RPM | Requests/Day | TPM |
|---|---|---|---|
| 2.5 Pro | 5 | 100 | 250K |
| 2.5 Flash | 10 | 250 | 250K |
| 2.5 Flash-Lite | 15 | 1,000 | 250K |
| 3 Flash | 10 | 250 | 250K |
| 3.1 Flash-Lite | 15 | 1,000 | 250K |
No free tier: Gemini 3.1 Pro requires a paid billing account for API access. Gemini 2.5 Pro has a limited free tier (5 RPM, 100 RPD) but is the most restricted of the free options.
December 2025 quota cut: Google reduced free tier limits by 50-80%. If you're following older tutorials, those numbers are stale. Current limits.
Retry Pattern
import time from google import genai from google.api_core import exceptions client = genai.Client() def call_with_retry(prompt, model="gemini-2.5-flash", max_retries=3): for attempt in range(max_retries): try: response = client.models.generate_content( model=model, contents=prompt ) return response.text except exceptions.ResourceExhausted: wait = 2 ** attempt print(f"Rate limited. Waiting {wait}s...") time.sleep(wait) raise Exception("Max retries exceeded")
View your current limits in AI Studio's quota dashboard.
Limitations
- Preview instability. Gemini 3 Pro Preview was deprecated March 9, 2026 with minimal migration window. Pin to stable releases for production.
- Free tier data usage. Google may use free tier prompts/responses to improve products. Paid tier excludes this. Terms.
- Hallucination. All LLMs generate plausible-sounding incorrect output. Grounding reduces this but doesn't eliminate it.
- Region availability. Some models and features aren't available in all regions. Check here.
- Output cap. 64K output tokens per response (about 48,000 words). Implement chunking for longer generation.
Troubleshooting
Cause: Key from a disabled project, or Generative Language API isn't enabled.
Fix: Generate a new key in a fresh project via AI Studio. The API enables automatically. Source: Google AI Quickstart.
Cause: December 2025 quota reduction. Flash is now 10 RPM / 250 RPD.
Fix: Add delays between calls, use Flash-Lite (15 RPM / 1,000 RPD), or enable billing for Tier 1 ($250/month cap). Current limits.
Cause: AI Studio defaults to specific temperature and safety settings that differ from SDK defaults.
Fix: Set temperature, top_p, and safety_settings explicitly in GenerateContentConfig. Config docs.
Cause: Using response_mime_type without response_schema.
Fix: Always pair both. Pass a Pydantic model or dict as the schema. Structured output docs.
Cause: Deprecated google-generativeai package installed.
Fix: pip uninstall google-generativeai && pip install -U google-genai. Deprecation notice.
What's Next
Build a function-calling agent. The API's tool use system lets the model call your functions, inspect results, and chain actions. Combined with grounding and structured output, you have the pieces for a production agentic AI workflow.
New in 3.1: Thought Signatures preserve reasoning context across multi-turn conversations, improving coherence in function-calling and agentic workflows. The media_resolution parameter lets you control how multimodal inputs are tokenized. Gemini 3.1 Flash Live is a new low-latency audio-to-audio model for real-time conversational applications.
Google, Gemini, Google AI Studio, and Vertex AI are trademarks of Google LLC. This article is not affiliated with, sponsored by, or endorsed by Google LLC.