How much does Google Gemma cost?

Gemma 4 is free to download and self-host under the Apache 2.0 license. There are no licensing fees, no revenue caps, and no user thresholds. You pay only for your own hardware and electricity. If you prefer API access through third-party providers like OpenRouter, Gemma 3 4B costs approximately $0.02 per million input tokens and $0.04 per million output tokens.

What license does Gemma use?

Gemma 4 uses the Apache 2.0 license, which is fully permissive. You can use it commercially, redistribute it, and fine-tune it without restrictions. Earlier Gemma generations (1 through 3) used a custom 'Gemma Terms of Use' license that was more restrictive.

What Gemma model sizes are available?

Gemma 4 comes in four sizes: E2B (2.3B effective parameters, ~3 GB VRAM), E4B (4.5B effective parameters, ~6 GB VRAM), 27B MoE (128 experts with 3.8B active, ~14-16 GB VRAM at 4-bit), and 31B Dense (30.7B parameters, ~16-22 GB VRAM at 4-bit). Edge models support 128K context; large models support 256K context.

How does Gemma licensing compare to Llama?

Gemma 4 uses Apache 2.0 with zero commercial restrictions. Meta Llama requires a separate license if your product or service has more than 700 million monthly active users. For the vast majority of organizations this threshold is irrelevant, but Gemma's Apache 2.0 license has no such clause at all.

Google Gemma

Gemma Pricing & Models: Sizes, Licensing & API Costs

Bottom line: Gemma 4 costs nothing to self-host. The Apache 2.0 license has no revenue caps, no user thresholds, and no redistribution restrictions. You pay only for hardware and electricity. Third-party API access starts at roughly $0.02 per million input tokens.

License Cost

Apache 2.0

License Type

Google AI

Model Sizes

Gemma Docs

256K

Max Context

Model Card

What Does Gemma Cost?

The short answer: nothing, if you run it yourself. Gemma 4 ships under the Apache 2.0 license. You download the model weights from Kaggle, Hugging Face, or Google AI Studio, load them onto your own GPU, and start generating. No license fees. No API keys. No monthly subscription.

The practical cost is hardware. A laptop with an RTX 4060 (8 GB VRAM) can run the E4B model at usable speeds. The 27B MoE variant needs a workstation-class GPU with 16 GB or more. If you already own the hardware, the incremental cost is electricity.

10x

Gemma 3 4B costs roughly one-tenth the per-token price of Llama 3.1 70B through third-party API providers like OpenRouter.

If you do not want to manage infrastructure, third-party providers serve Gemma models via API. Gemma 3 4B runs at approximately $0.02 per million input tokens and $0.04 per million output tokens through OpenRouter. That pricing puts Gemma among the cheapest API-accessible models available in 2026.

Google does not sell direct Gemma API access the way it sells Gemini. Gemma is a weights release, not a hosted service. You either self-host or use a third-party inference provider.

Licensing Deep Dive

Gemma's licensing history has two chapters. Earlier generations (Gemma 1, 2, and 3) shipped under a custom "Gemma Terms of Use" license. That license was source-available rather than truly open source. It allowed inspection and modification of the weights but imposed conditions on commercial redistribution that made corporate legal teams uncomfortable.

Gemma 4 changed the game. Google released the entire Gemma 4 family under Apache 2.0, which is one of the most permissive licenses in the software world. Here is what Apache 2.0 gives you:

Commercial use with zero restrictions. Sell products, charge for services, embed Gemma in paid applications. No royalties.
Redistribution of original or modified weights. Package Gemma inside your product and ship it to customers.
Fine-tuning with no obligations beyond preserving the license notice. Your fine-tuned model stays yours.
No user thresholds. Unlike some open-weight licenses, there is no cap on monthly active users or annual revenue.
Patent grant. Apache 2.0 includes an explicit patent license from contributors, reducing legal risk for enterprise adoption.

Practical takeaway: If your legal team blocked an earlier Gemma deployment because of the Terms of Use, revisit the conversation. Apache 2.0 is pre-approved by most enterprise open-source review boards.

Model Size Guide

Gemma 4 ships in four sizes. The naming convention uses the effective parameter count (the number of parameters active during inference), not the total parameter count including embeddings. This distinction matters for memory planning.

Edge

Gemma 4 E2B

Phone and IoT deployments

Effective 2.3B

Total 5.1B

VRAM ~3 GB

Context 128K

Sweet Spot

Gemma 4 E4B

Laptop and desktop inference

Effective 4.5B

Total 8B

VRAM ~6 GB

Context 128K

MoE

Gemma 4 27B MoE

128 experts, 3.8B active per token

Total 26B

Active 3.8B

VRAM (4-bit) ~14-16 GB

Context 256K

Dense

Gemma 4 31B Dense

Maximum quality, highest VRAM

Effective 30.7B

Total 31B

VRAM (4-bit) ~16-22 GB

Context 256K

The MoE (Mixture of Experts) architecture in the 27B variant deserves explanation. While the model contains 26 billion total parameters across 128 expert networks, only 3.8 billion activate for any given token. This means it runs faster and uses less memory than a 26B dense model would, while still benefiting from the collective knowledge stored across all experts.

FREE TEMPLATE

AI Governance Charter

Establish your organization's AI principles in one document

Download Free →

API Pricing by Provider

Google does not operate a dedicated Gemma API. Instead, third-party inference providers host Gemma models and charge per token. Pricing varies by provider, model size, and whether you commit to reserved capacity or use on-demand pricing.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Provider
Gemma 3 4B	~$0.02	~$0.04	OpenRouter
Llama 3.1 70B (comparison)	~$0.20	~$0.20	OpenRouter

Pricing verified May 28, 2026 via OpenRouter. Rates fluctuate. Check provider dashboards for current pricing.

The cost gap is significant. Gemma 3 4B delivers roughly 10x cheaper inference than Llama 3.1 70B. The tradeoff is capability: a 4B parameter model cannot match a 70B model on complex reasoning tasks. But for classification, summarization, extraction, and straightforward generation, the smaller model often performs well enough that the cost savings justify the quality difference.

Self-Hosting Cost Calculator

Self-hosting eliminates per-token costs entirely. Your expenses are hardware (one-time or amortized) and electricity (ongoing). Here is a realistic breakdown for each model tier.

Model	Minimum GPU	VRAM Needed	GPU Cost (New)	Electricity/Month
E2B	RTX 3060 / Apple M1	~3 GB	$300-$400	$3-$8
E4B	RTX 4060 / Apple M2	~6 GB	$300-$500	$5-$12
27B MoE (4-bit)	RTX 4090 / A6000	~14-16 GB	$1,600-$4,500	$15-$30
31B Dense (4-bit)	RTX 4090 / A6000	~16-22 GB	$1,600-$4,500	$15-$30

GPU prices are approximate US retail as of May 2026. Electricity assumes $0.12/kWh and moderate daily usage.

For the E2B and E4B models, most developers already own capable hardware. The marginal cost of running Gemma locally is effectively just electricity. For the larger models, the RTX 4090 remains the best consumer-grade option. If you already own one for gaming or creative work, Gemma 4 is a free addition to your toolkit.

Cloud GPU alternative: Renting cloud GPU instances (Lambda Labs, RunPod, vast.ai) typically costs $0.50 to $2.00 per hour depending on the GPU tier. For intermittent use, this can be cheaper than buying hardware. For sustained workloads, owned hardware wins within 3 to 6 months.

Gemma vs Llama Licensing

This comparison matters because Gemma and Llama are the two most-deployed open-weight model families. Their licenses differ in one critical area.

Clause	Gemma 4 (Apache 2.0)	Llama 3.x (Meta License)
Commercial use	Unrestricted	Allowed
User threshold	None	700M MAU cap
Revenue cap	None	None stated
Redistribution	Allowed (preserve notice)	Allowed with restrictions
Fine-tuning	Unrestricted	Allowed
Patent grant	Explicit (Apache 2.0 Sec. 3)	Limited
OSI approved	Yes	No

The 700 million monthly active user threshold in Meta's Llama license is irrelevant for most organizations. But if you are building infrastructure for a platform that could reach that scale, Gemma's Apache 2.0 removes a potential future licensing conversation entirely. More importantly, many enterprise procurement teams have pre-approved Apache 2.0 for internal use. Meta's custom license typically requires separate legal review.

Fine-Tuning Costs

Fine-tuning Gemma is where the Apache 2.0 license pays for itself. There are no API-based fine-tuning fees because there is no first-party API. You fine-tune locally using open-source tooling.

Software cost: $0. Use Hugging Face Transformers, PEFT, or Unsloth. All open-source.
Hardware for E2B/E4B: A single consumer GPU (RTX 3060 or better). QLoRA reduces memory requirements to a fraction of full fine-tuning.
Hardware for 27B/31B: A single RTX 4090 can handle QLoRA fine-tuning. Full fine-tuning requires multi-GPU setups or cloud GPU rentals.
Time: A typical QLoRA fine-tune on a domain-specific dataset (10K to 50K examples) takes 2 to 8 hours on an RTX 4090 for the E4B model.
Electricity: At 450W GPU power draw and $0.12/kWh, an 8-hour fine-tuning run costs approximately $0.43 in electricity.

Compare this to fine-tuning a proprietary model through an API provider, which can cost hundreds to thousands of dollars per run depending on dataset size and model tier. The Gemma approach is orders of magnitude cheaper if you have the hardware.

When to Pay for Managed APIs

Self-hosting is not always the right call. Here are situations where paying for API-based Gemma access makes more sense than running it yourself.

If your team works on CPU-only machines or laptops without discrete GPUs, API access is the only practical option. Buying a dedicated GPU for occasional inference rarely makes financial sense.

Applications with unpredictable traffic spikes benefit from elastic API infrastructure. A local GPU has fixed throughput. An API provider auto-scales to handle request surges without queuing.

When testing whether Gemma is the right model for your use case, paying fractions of a cent per request through an API is faster and cheaper than setting up local infrastructure. Commit to self-hosting after you validate the approach.

For steady-state workloads where you process thousands of requests daily, the math almost always favors self-hosting. The crossover point depends on your volume: at Gemma 3 4B API pricing ($0.02/$0.04 per million tokens), you would need to process hundreds of millions of tokens monthly before an RTX 4090 pays for itself faster than the API route. For the larger models at higher per-token costs, the crossover arrives sooner.

Specialized Gemma Variants

Beyond the core language models, Google has released domain-specific Gemma variants. Each inherits the same licensing terms as the base model generation it was built on.

Variant	Domain	Use Case
PaliGemma	Vision-language	Image captioning, visual question answering, document understanding
CodeGemma	Code	Code generation, completion, explanation, and refactoring
MedGemma	Medical	Clinical text analysis, medical Q&A (not for diagnosis)
ShieldGemma	Safety	Content filtering, toxicity detection, safety classification

These variants cost the same as the base models to run: nothing for the software, your hardware for inference. They are particularly valuable because equivalent proprietary solutions (GPT-4 Vision, Claude with image inputs) charge premium per-token rates for multimodal capabilities.

Frequently Asked Questions

Is Gemma really free?

Yes. The Gemma 4 model weights are released under Apache 2.0. There is no licensing fee, no subscription, and no per-use charge from Google. Your costs are hardware (one-time) and electricity (ongoing). Third-party API providers charge for hosting, but that is their service fee, not a Google charge.

Can I use Gemma in a commercial product?

Yes, without restrictions under Gemma 4's Apache 2.0 license. You can embed Gemma in paid products, offer it as a service, or redistribute modified weights. The only requirement is preserving the Apache 2.0 license notice.

What changed from Gemma 3 to Gemma 4 licensing?

Gemma 1 through 3 used a custom "Gemma Terms of Use" license that was source-available but not OSI-approved open source. Gemma 4 switched to Apache 2.0, removing all commercial restrictions and making it compatible with enterprise open-source policies.

Which Gemma model should I start with?

Start with E4B. It runs on most modern laptops with a discrete GPU (6 GB VRAM), supports 128K context, and offers the best balance between capability and hardware requirements. Move to 27B MoE only if E4B's quality is insufficient for your task.

How does Gemma compare to GPT-4 on cost?

Self-hosted Gemma has zero marginal cost per token. GPT-4 charges $30 per million input tokens and $60 per million output tokens (as of early 2026). The quality gap exists, but for many production tasks like classification, extraction, and summarization, Gemma's smaller models deliver adequate results at a fraction of the cost.

Gemma 4: Everything You Need to Know

YouTube Search

Official walkthrough of the Gemma 4 model family, architecture decisions, and benchmark results.

Running Gemma Locally with Ollama

YouTube Search

Practical setup guide for self-hosting Gemma on consumer hardware with Ollama.

Fine-Tuning Gemma with QLoRA

YouTube Search

Step-by-step QLoRA fine-tuning on a single GPU, including dataset preparation and evaluation.

Hub

Google Gemma Hub

All Gemma articles, guides, and comparisons in one place.

Hub

AI Tools Hub

Compare 70+ AI tools across vendors with independent, practitioner-focused analysis.

Breakdown

Gemini Pricing Breakdown

How Google Gemini's API and subscription pricing compares to competitors.

Go Deeper

Resources from across Tech Jacks Solutions

FREEAI Governance Charter

Establish your organization's AI principles in one document

AI Career Paths

Explore roles that work with these tools daily

EU AI Act Guide

Check your compliance obligations under the EU AI Act

FREEAI Risk Management Template

Identify, assess, and mitigate AI deployment risks

FREEAI Bias Assessment

Evaluate bias risks before deploying any AI system

Verified May 2026

Gemma is a trademark of Google LLC. Llama is a trademark of Meta Platforms, Inc. GPT is a trademark of OpenAI. Claude is a trademark of Anthropic. All other trademarks belong to their respective owners. Tech Jacks Solutions is not affiliated with Google, Meta, or any model provider mentioned in this article.

Gallery

Contacts

Gemma Pricing & Models: Sizes, Licensing & API Costs

What Does Gemma Cost?

Licensing Deep Dive

Model Size Guide

API Pricing by Provider

Self-Hosting Cost Calculator

Gemma vs Llama Licensing

Fine-Tuning Costs

When to Pay for Managed APIs

Specialized Gemma Variants

Frequently Asked Questions

Is Gemma really free?

Can I use Gemma in a commercial product?

What changed from Gemma 3 to Gemma 4 licensing?

Which Gemma model should I start with?

How does Gemma compare to GPT-4 on cost?

Go Deeper

Services

Learn

Company