Meta Llama
The open-weight model family powering Meta AI for 3.27 billion users -- and running locally on consumer hardware.
What's New
The Llama timeline, most recent first
Llama 4 launched: Scout (10M context), Maverick (400B MoE), Behemoth in training -- Meta's shift to Mixture of Experts and native multimodality.
Llama 3.2: vision multimodal models (11B/90B) plus 1B/3B on-device edge models capable of running on iPhone and Android offline.
Llama 3.1 405B -- the first open-weight model to rival GPT-4 and Claude 3.5 Sonnet frontier performance, with 128K context and a 405 billion parameter count.
Llama 2: commercial license plus Microsoft partnership made Llama enterprise-deployable for the first time, opening the door to production use.
What Is Meta Llama?
From leaked research weights to a frontier model family
Meta Llama is a family of open-weight foundation models developed by Meta AI. Unlike proprietary models accessed exclusively through a vendor API, Llama weights are freely downloadable -- allowing organizations to run, fine-tune, and deploy the models on their own hardware without per-call licensing fees. The term "open-weight" is precise and intentional: the model parameters are publicly available, but the full training data and methodology are not disclosed, which means Llama does not meet the Open Source Initiative's formal definition of open-source software. That distinction matters when evaluating what you can and cannot audit.
Llama's public story started on February 24, 2023 with Llama 1 -- a research-only release with four sizes (7B to 65B parameters) pre-trained on 15 trillion tokens. A week later, on March 3, 2023, the weights leaked publicly on 4chan, accelerating community adoption far beyond what Meta had planned. The moment was a turning point: it demonstrated both the demand for open-weight AI and the governance challenges of releasing large model weights. Llama 2 in July 2023 answered those challenges with a formal commercial license and a high-profile Microsoft partnership, enabling legitimate production deployments. Llama 3 (April 2024) and Llama 3.1 (July 2024) pushed quality further, with the 405B variant becoming the first open-weight model to benchmark competitively against GPT-4 and Claude 3.5 Sonnet on major evaluations.
Llama 4, released April 5, 2025, represents a fundamental architectural shift. The Scout and Maverick models use Mixture of Experts (MoE) design -- activating only a fraction of total parameters per forward pass, making them dramatically more compute-efficient than dense predecessors. Both models include native multimodality from the ground up via early-fusion architecture, meaning vision and language are fused at the token level rather than bolted on. Scout's 10 million token context window is the largest of any open-weight model at launch. The forthcoming Behemoth, at approximately 2 trillion total parameters, is primarily a teacher model for distillation -- it was still in training at Llama 4's release date and remains unreleased as of April 2026. Meta's rationale for open-weighting these models, articulated by Mark Zuckerberg, centers on the belief that open AI development strengthens American competitiveness and gives individuals and businesses genuine ownership of the AI they depend on.
Key Numbers
Verified figures -- sources noted
Editorial Note: Open-Weight vs. Open-Source
Meta Llama Is Open-Weight, Not Fully Open-Source
Meta Llama models are "open-weight" -- the model weights are freely downloadable and you can run, fine-tune, and deploy them commercially (within license limits). However, the training data and full methodology are not publicly disclosed. This differs from the OSI definition of open-source software, which requires full transparency into how a system was built. You can operate Llama with full data sovereignty, but you cannot completely audit its construction. Coverage across this hub uses the term "open-weight" rather than "open-source" -- that word choice is intentional and editorially accurate.
Who Is Meta Llama For?
Five reader personas this hub serves
On-premises or air-gapped deployments where data cannot leave the firewall. No cloud API dependency.
Organizations processing millions of requests. Self-hosting eliminates per-call API fees at scale.
Access to full weights for fine-tuning, mechanistic interpretability, and novel training experiments.
Llama 3.2 1B/3B models run on iPhone and Android offline -- no network call, full privacy.
WhatsApp, Instagram, Messenger, and Facebook integrations using Meta AI's native Llama backbone.
The Llama Model Family
Key releases from Llama 1 (2023) through Llama 4 Behemoth
| Model | Released | Architecture | Key Spec | Context | Status |
|---|---|---|---|---|---|
| Llama 1 | Feb 24, 2023 | Dense Transformer | 7B / 13B / 30B / 65B params; 15T token pre-train | 2K tokens | Research Only |
| Llama 2 | Jul 18, 2023 | Dense Transformer | 7B / 13B / 70B params; Meta + Microsoft partnership | 4K tokens | Generally Available |
| Llama 3 | Apr 18, 2024 | Dense Transformer | 8B / 70B params; 15T+ token pre-train | 128K tokens | Generally Available |
| Llama 3.1 | Jul 23, 2024 | Dense Transformer | 8B / 70B / 405B params; first open-weight frontier-competitive | 128K tokens | Generally Available |
| Llama 3.2 | Sep 25, 2024 | Dense Transformer + Vision | 1B / 3B (text) + 11B / 90B (vision-language) | 128K tokens | Generally Available |
| Llama 4 Scout | Apr 5, 2025 | MoE + Early-Fusion Multimodal | 109B total / 17B active params | 10M tokens | Generally Available |
| Llama 4 Maverick | Apr 5, 2025 | MoE + Early-Fusion Multimodal | 400B total / 17B active params | 1M tokens | Generally Available |
| Llama 4 Behemoth | In training (Apr 2025) | MoE -- Teacher Model | ~2T total params (active count unconfirmed); distillation source for Scout/Maverick | TBD | Not Released |
Meta Llama Timeline
Every major milestone from first release through Llama 4
Meta releases Llama 1 in four sizes (7B, 13B, 30B, 65B) trained on 15 trillion tokens. Distribution limited to approved researchers via form submission.
The full Llama 1 model weights are posted publicly on 4chan, accelerating community adoption and fine-tuning far beyond Meta's planned rollout. The moment crystallizes both the demand for open-weight AI and the governance challenges of distributing large model weights.
Meta and Microsoft jointly release Llama 2 with three sizes (7B, 13B, 70B) under a commercial license. The partnership means Llama 2 ships on Azure AI and is immediately available through Microsoft's distribution channels.
Llama 3 launches in 8B and 70B sizes with 128K context windows and 15T+ token pre-training. Strong performance on MMLU and coding benchmarks positions it as a credible alternative to proprietary models for many use cases.
Meta releases Llama 3.1 with a 405 billion parameter variant that benchmarks competitively with GPT-4 and Claude 3.5 Sonnet -- the first open-weight model to reach that tier. The 128K context window and commercial license availability mark a milestone for the open-weight AI ecosystem.
Llama 3.2 introduces vision-language models (11B and 90B) for image understanding tasks, and lightweight 1B and 3B models designed to run fully on-device on iPhone and Android hardware without an internet connection.
Meta ships Llama 4 Scout and Maverick, both built on Mixture of Experts architecture with early-fusion native multimodality. Scout's 10 million token context window is the largest of any open-weight model at launch. Behemoth (~2T parameters) is announced as a teacher model and remains in training.
As of April 2026, Llama 4 Behemoth remains unreleased. Scout and Maverick are the current production frontier models in the Llama 4 family.
Deployment Options
Four ways to run Meta Llama models in production
Local and Self-Hosted
Download model weights from Meta or Hugging Face and run on your own hardware -- from a consumer GPU through a multi-server cluster. No data leaves your infrastructure. Cost is hardware and energy only, with no per-call fees.
Cloud Provider APIs
All major cloud providers offer managed Llama endpoints. No GPU procurement required -- pay per token or per hour of provisioned inference. Useful for teams that need immediate scale without infrastructure management.
On-Device (Mobile and Edge)
Llama 3.2 1B and 3B models run fully on-device on modern iPhone and Android hardware. Inference happens locally with no network call required -- maximum privacy, zero latency variance from API calls, offline capable.
Meta Apps (Built-In Access)
Meta AI -- powered by Llama -- is built directly into WhatsApp, Instagram, Messenger, Facebook, and Ray-Ban Meta glasses. No developer integration required for end users. 3.27 billion people already have access without downloading anything extra.
Our Coverage
In-depth editorial on Meta Llama -- Tier 1
Licensing Quick Reference
Llama 4 Community License -- key terms at a glance
| Use Case | License Required | Notes |
|---|---|---|
| Research and personal projects | Free -- Community License | No application required |
| Commercial products under 700M MAU | Free -- Community License | Applies to most businesses; no revenue ceiling |
| Commercial products over 700M MAU | Commercial License -- Meta Approval | Applies to Google, Apple, and similar scale platforms |
| Training competing frontier models | Prohibited | Cannot use Llama outputs to train competing AI models |
| Fine-tuning on your own data | Free -- Community License | Permitted for both research and commercial use within MAU threshold |
| On-premises and air-gapped deployment | Free -- Community License | Data remains entirely within your infrastructure |
Before You Use AI
Your Privacy
When you use Meta AI through WhatsApp, Instagram, Messenger, or Facebook, your prompts are processed by Meta's servers and are subject to Meta's data privacy policy, including potential use for service improvement.
If you self-host Llama models on your own hardware, your data never leaves your infrastructure. No telemetry is sent to Meta. This is a meaningful distinction for enterprise and regulated-industry deployments.
Review Meta's privacy center for full details: facebook.com/privacy/center
Mental Health and AI Dependency
AI tools can be valuable for productivity, learning, and creative work -- but they are not a substitute for human connection or professional mental health support.
If you or someone you know is struggling, please reach out:
988 Suicide and Crisis Lifeline: call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: text HOME to 741741
For AI risk frameworks: NIST AI Risk Management Framework
Your Rights and Our Transparency
Under GDPR and CCPA, you have rights to access, correct, and delete personal data held by AI service providers. Contact Meta's data protection team for requests related to Meta AI.
This coverage is editorially independent. TechJack Solutions may earn affiliate revenue from links to third-party cloud providers hosting Llama models. Editorial conclusions are not influenced by commercial relationships.
For EU readers: Llama models may be subject to the EU AI Act depending on deployment context and risk classification.