LiteLLM is an open-source LLM gateway from BerriAI. It comes in two forms: a Python SDK that acts as a drop-in replacement for the OpenAI client, and a self-hosted Proxy Server (the AI Gateway) that exposes one OpenAI-compatible endpoint in front of 100+ LLM providers. The vendor reports coverage of 100+ providers including OpenAI, Anthropic, Gemini, Vertex AI, Bedrock, Azure, and HuggingFace.

The core SDK and the Proxy Server are open-source and free to self-host. A separate Enterprise tier (Commercial License) adds SSO/SAML, audit logs, custom SLAs, feature prioritization, custom integrations, and dedicated Slack or Discord support. Enterprise pricing is not published; the maintainers ask prospective customers to contact them directly. Verify current terms before purchasing.

LLM Gateways

What Is LiteLLM? The Open-Source LLM Gateway Explained

Q: What is the difference between the LiteLLM SDK and the Proxy Server?

The Python SDK is a library you import into your application code. It replaces the OpenAI client with completion(), embedding(), and image_generation() calls and a Router for retries, fallbacks, and load balancing. The Proxy Server is a standalone service you host. It puts a central OpenAI-compatible endpoint in front of every provider and adds virtual keys, spend tracking, guardrails, an admin dashboard, routing, fallbacks, and retries. Developers reach for the SDK; platform and GenAI enablement teams run the proxy.

Q: How do you install LiteLLM?

Install the SDK with pip install litellm. Install the proxy with pip install 'litellm[proxy]'. Start the proxy from the command line with litellm --model followed by a model name, or run it from the -stable Docker image. The -stable Docker images are put through 12-hour load tests before release.

LiteLLM is an open-source LLM gateway that gives you one interface to more than 100 model providers. Instead of writing separate integration code for OpenAI, Anthropic, Google Gemini, Vertex AI, Bedrock, Azure, and HuggingFace, you call one OpenAI-compatible API and let LiteLLM translate. It is maintained by BerriAI, and the vendor reports coverage of 100+ providers.

That single sentence hides a useful detail: LiteLLM is really two products that share a name. There is a Python SDK you import into your code, and there is a Proxy Server, the piece BerriAI calls the AI Gateway, that you stand up as a shared service. Knowing which one you need is the first decision, so this breakdown starts there and works through translation, gateway features, performance, the open-source versus Enterprise split, install, and security posture.

100+

Providers (vendor-reported)

BerriAI/litellm

Forms: SDK + Proxy

LiteLLM docs

8ms

P95 @ 1k RPS (vendor-reported)

BerriAI/litellm

OSS

SDK + Proxy free to self-host

BerriAI/litellm

What Is LiteLLM?

LiteLLM is not a model. It does not generate text. It sits between your application and the model providers and normalizes everything: the request format, the response format, and even the error types. You write your code once against an OpenAI-style interface, and LiteLLM handles the per-provider differences underneath.

The practical payoff is that switching models stops being a rewrite. If you start on one provider and later move to another, your calling code does not change. The same logic applies when you add a second provider as a fallback or split traffic across several for load balancing. LiteLLM centralizes that plumbing so each application does not reimplement it.

Practitioner note: The thing people miss is that LiteLLM is two distinct deployment shapes. As a library, it lives inside one application. As a proxy, it becomes shared infrastructure that many applications point at. The features you get differ between the two, so pick the shape before you pick the feature list.

The SDK vs the Proxy Server

This is the distinction that determines everything else. The Python SDK and the Proxy Server solve related problems for different audiences.

Python SDK

Drop-in OpenAI client replacement for developers

Calls completion()

Also embedding(), image_generation()

Router retry, fallback, load balance

Proxy Server

Self-hosted AI Gateway for platform teams

Shape Central service

Endpoint OpenAI-compatible

Adds Keys, spend, guardrails, UI

Use the SDK when you are a developer building a single application and you want provider portability without standing up extra infrastructure. You import LiteLLM, swap your client for its completion(), embedding(), and image_generation() calls, and use the Router for retries, fallbacks, and load balancing.

Use the Proxy Server when you are a platform or GenAI enablement team serving many internal applications. You run it as a central OpenAI-compatible service, then every team points at it. That is where the governance features live: virtual keys, spend tracking, guardrails, and an admin dashboard. Those features do not exist in the SDK because the SDK has no central place to enforce them.

OpenAI-Format Translation

The mechanism that makes the gateway useful is format translation. Every response that comes back through LiteLLM follows the OpenAI Chat Completions format, no matter which provider produced it. Your output parser sees the same shape whether the call went to Anthropic, Gemini, or Bedrock.

LiteLLM also maps each provider's errors onto the OpenAI exception types. That detail matters more than it sounds. Without it, your error handling would need a separate branch for every provider's failure modes. With it, you catch one set of exceptions and your retry and fallback logic stays uniform across all 100-plus providers the vendor reports supporting.

1 format

Every response follows the OpenAI Chat Completions format regardless of provider, and provider errors are mapped to OpenAI exception types. That is what makes drop-in compatibility real rather than aspirational.

This is the difference between a gateway and a thin wrapper. A wrapper forwards your call; a gateway normalizes the request and the response so the rest of your stack can stay provider-agnostic.

Gateway Features (Proxy Server)

These are the capabilities that justify running the Proxy Server as shared infrastructure. They are governance and reliability controls, not model features, and they apply across every provider behind the gateway.

Issue per-key, per-team, and per-user keys, each with its own budget. Teams get access without sharing raw provider credentials.

Attribute spend back to keys, teams, and users so cost is visible and attributable rather than a single opaque provider bill.

Apply content filtering and PII masking at the gateway, so the policy is enforced centrally rather than reimplemented in each app.

Distribute traffic across multiple deployments or providers to smooth out rate limits and capacity pressure.

Route requests across models and fall back to an alternate when a provider fails, keeping requests served during an outage.

Retry transient failures automatically, so a single flaky call does not surface as an error to the calling application.

A built-in UI to manage keys, view spend, and configure the proxy without editing config files by hand for every change.

The Router brings the reliability subset (retries, fallbacks, and load balancing) into the SDK as well. The governance subset (virtual keys, spend tracking, and the admin dashboard) is proxy-only, because those only make sense when many applications share one control point.

Performance Numbers

A gateway adds a hop, so the fair question is how much latency that hop costs. The figures below come from BerriAI's own documentation and README. Treat them as vendor-reported: useful for a sense of scale, not a substitute for testing against your own workload.

8ms P95

Added latency at 1,000 requests per second

Source Vendor-reported

1.5k+ RPS

Throughput the proxy handles in load tests

Source Vendor-reported

12 hr

Load test duration for -stable images

Source Vendor-reported

The practical takeaway: the -stable Docker images are the ones BerriAI runs through a 12-hour load test before release, which is why they are the recommended tag for production. If single-digit-millisecond added latency holds on your hardware, the gateway hop is unlikely to be your bottleneck. Confirm it anyway under your own traffic profile.

Open Source vs Enterprise

Both the core SDK and the Proxy Server are open-source and free to self-host. You can run the full gateway, with virtual keys, spend tracking, guardrails, routing, and the admin dashboard, without paying for a license. For many teams that is the whole product.

A separate Enterprise tier, under a Commercial License, layers on the controls that larger organizations tend to require:

SSO / SAML for single sign-on against your identity provider
Audit logs for compliance and incident review
Custom SLAs and feature prioritization
Custom integrations built for your environment
Dedicated support via Slack or Discord

BerriAI does not publish Enterprise pricing. The documented path is to contact the maintainers directly to talk through terms and schedule a demo. Budget accordingly and verify current terms before committing.

Practically, that means you can prototype and even run production on the open-source tier, then move to Enterprise when an SSO requirement, an audit-log mandate, or an SLA need forces the conversation.

Install & Observability

Getting started depends on which form you want. The SDK is a single pip install. The Proxy Server adds an extras group, then runs from the command line or a Docker image.

Shell: install the SDKpip install litellm

Shell: install the Proxy Serverpip install 'litellm[proxy]'

With the proxy installed, start it from the command line and point it at a model:

Shell: start the proxylitellm --model

For production, the -stable Docker image is the recommended tag, since it is the one put through the 12-hour load test:

Shell: run the -stable Docker imagedocker run -e LITELLM_MASTER_KEY=sk-... -p 4000:4000 ghcr.io/berriai/litellm:main-stable

In code, the SDK entry point is completion(), which mirrors the OpenAI client. The model string carries the provider, so switching providers is a string change rather than a code change:

Python: SDK completion callfrom litellm import completion response = completion( model="anthropic/claude-opus-4-8", messages=[{"role": "user", "content": "Hello"}] )

For observability, LiteLLM ships single-line callback integrations with the tools most LLM teams already run:

Tracing and analytics for LLM calls, wired in as a callback.

Experiment tracking and model lifecycle logging.

Request logging, monitoring, and cost visibility.

Analytics and observability for LLM applications.

Security Posture

A proxy that brokers every provider API key is a high-value target by design. If the gateway is compromised, the blast radius is every credential it holds. That context is the reason to treat a LiteLLM proxy with the same care you would give a secrets vault: least privilege, network isolation, scoped virtual keys, and prompt patching.

The proxy centralizes provider keys so applications never see raw credentials. That is a security benefit, but it concentrates risk in one place. Pin to -stable releases, isolate the proxy on the network, and scope virtual keys tightly.

The maintainers publish security advisories and ship fixes in version bumps. Keeping current with releases and watching the advisories is part of running the proxy responsibly.

This breakdown does not enumerate individual advisories. For the full timeline, including the supply-chain package incident and the disclosed code vulnerabilities, read the dedicated LiteLLM security incident article in this cluster.

When to Use LiteLLM

LiteLLM is a strong default for multi-provider work, but it is not the answer to every situation. Here is an honest read on the fit.

Reach for the SDK when...

You are a developer who wants provider portability inside one application and you would rather not run extra infrastructure. The Router covers retries, fallbacks, and load balancing without a separate service.

Run the Proxy when...

You are a platform team serving many internal apps and you need central key management, spend attribution, and policy enforcement. The governance features only exist at the proxy layer.

Look elsewhere when...

You want a fully hosted aggregator with no infrastructure to run, or an edge-native gateway. A hosted option like OpenRouter or a managed edge gateway may fit better than self-hosting.

Weigh the tradeoff when...

You need vendor-backed SLAs, SSO, and audit logs. The open-source tier may not cover them, which pushes you toward the Enterprise license and its non-public pricing.

For a head-to-head on the hosted-versus-self-hosted question, the OpenRouter vs LiteLLM comparison in this cluster works through the same decision in detail.

Frequently Asked Questions

What is LiteLLM used for?

LiteLLM is used to call many LLM providers through one OpenAI-compatible interface. Developers use the Python SDK for provider portability inside an application; platform teams run the Proxy Server as a shared AI Gateway that adds virtual keys, spend tracking, guardrails, routing, fallbacks, and an admin dashboard across more than 100 providers the vendor reports supporting.

Is LiteLLM free?

The core SDK and the Proxy Server are open-source and free to self-host. A separate Enterprise tier (Commercial License) adds SSO/SAML, audit logs, custom SLAs, feature prioritization, custom integrations, and dedicated support. Enterprise pricing is not published; BerriAI asks prospective customers to contact them directly. Verify current terms before purchasing.

What is the difference between the LiteLLM SDK and the Proxy Server?

The SDK is a library you import into one application. The Proxy Server is a standalone OpenAI-compatible service that many applications point at. The reliability features (retries, fallbacks, load balancing) are available in both via the Router. The governance features (virtual keys, spend tracking, admin dashboard) are proxy-only, because they require a shared control point.

How do you install LiteLLM?

Install the SDK with pip install litellm and the proxy with pip install 'litellm[proxy]'. Start the proxy with litellm --model from the command line, or run the -stable Docker image for production. The -stable images are put through a 12-hour load test before release.

Does LiteLLM work with observability tools?

Yes. LiteLLM provides single-line callback integrations with Langfuse, MLflow, Helicone, and Lunary, so you can route tracing and cost data into whichever platform your team already uses.

Video Resources

LiteLLM Tutorial: Getting Started

YouTube Search

Walkthroughs of the SDK and the OpenAI-compatible interface for first-time users.

LiteLLM Proxy Server as an AI Gateway

YouTube Search

Setting up the proxy with virtual keys, spend tracking, and the admin dashboard.

LiteLLM Routing, Fallbacks & Load Balancing

YouTube Search

How the Router handles retries, provider fallbacks, and traffic distribution.

LLM Gateways

What Is an LLM Gateway?

The category explained: why a proxy layer sits between your apps and many model providers, and what it adds.

LLM Gateways

What Is OpenRouter?

The hosted aggregator alternative: one API to hundreds of models with pay-as-you-go pricing and no infrastructure.

Comparison

OpenRouter vs LiteLLM

Hosted aggregator versus self-hosted gateway, compared on control, cost, and operational burden.

Security

LiteLLM Security Incident Explained

The supply-chain package incident and disclosed advisories, with mitigations for proxy operators.

Go Deeper

Resources from across Tech Jacks Solutions

Agent Frameworks Compared

How orchestration frameworks call models, and where a gateway fits

Agent Threat Landscape

Security risks when a single layer brokers credentials and traffic

PREMIUMPre-Deployment Safety Gate

27-point checklist before any AI tool goes live

IAPP AIGP Certification

The AI governance certification for privacy professionals

Fact-checked against vendor documentation and official sources, June 2026. Verify current pricing at docs.litellm.ai before purchasing.

LiteLLM is a project of BerriAI. OpenAI and GPT are trademarks of OpenAI. Claude is a trademark of Anthropic. Gemini and Vertex AI are trademarks of Google. Bedrock and AWS are trademarks of Amazon.com, Inc. Azure is a trademark of Microsoft. All other trademarks belong to their respective owners.

Gallery

Contacts

What Is LiteLLM? The Open-Source LLM Gateway Explained

What Is LiteLLM?

The SDK vs the Proxy Server

OpenAI-Format Translation

Gateway Features (Proxy Server)

Performance Numbers

Open Source vs Enterprise

Install & Observability

Security Posture

When to Use LiteLLM

Frequently Asked Questions

What is LiteLLM used for?

Is LiteLLM free?

What is the difference between the LiteLLM SDK and the Proxy Server?

How do you install LiteLLM?

Does LiteLLM work with observability tools?

Video Resources

Go Deeper

Services

Learn

Company