What Is LiteLLM? The Open-Source LLM Gateway Explained
LiteLLM is an open-source LLM gateway that gives you one interface to more than 100 model providers. Instead of writing separate integration code for OpenAI, Anthropic, Google Gemini, Vertex AI, Bedrock, Azure, and HuggingFace, you call one OpenAI-compatible API and let LiteLLM translate. It is maintained by BerriAI, and the vendor reports coverage of 100+ providers.
That single sentence hides a useful detail: LiteLLM is really two products that share a name. There is a Python SDK you import into your code, and there is a Proxy Server, the piece BerriAI calls the AI Gateway, that you stand up as a shared service. Knowing which one you need is the first decision, so this breakdown starts there and works through translation, gateway features, performance, the open-source versus Enterprise split, install, and security posture.
What Is LiteLLM?
LiteLLM is not a model. It does not generate text. It sits between your application and the model providers and normalizes everything: the request format, the response format, and even the error types. You write your code once against an OpenAI-style interface, and LiteLLM handles the per-provider differences underneath.
The practical payoff is that switching models stops being a rewrite. If you start on one provider and later move to another, your calling code does not change. The same logic applies when you add a second provider as a fallback or split traffic across several for load balancing. LiteLLM centralizes that plumbing so each application does not reimplement it.
Practitioner note: The thing people miss is that LiteLLM is two distinct deployment shapes. As a library, it lives inside one application. As a proxy, it becomes shared infrastructure that many applications point at. The features you get differ between the two, so pick the shape before you pick the feature list.
The SDK vs the Proxy Server
This is the distinction that determines everything else. The Python SDK and the Proxy Server solve related problems for different audiences.
Use the SDK when you are a developer building a single application and you want provider portability without standing up extra infrastructure. You import LiteLLM, swap your client for its completion(), embedding(), and image_generation() calls, and use the Router for retries, fallbacks, and load balancing.
Use the Proxy Server when you are a platform or GenAI enablement team serving many internal applications. You run it as a central OpenAI-compatible service, then every team points at it. That is where the governance features live: virtual keys, spend tracking, guardrails, and an admin dashboard. Those features do not exist in the SDK because the SDK has no central place to enforce them.
OpenAI-Format Translation
The mechanism that makes the gateway useful is format translation. Every response that comes back through LiteLLM follows the OpenAI Chat Completions format, no matter which provider produced it. Your output parser sees the same shape whether the call went to Anthropic, Gemini, or Bedrock.
LiteLLM also maps each provider's errors onto the OpenAI exception types. That detail matters more than it sounds. Without it, your error handling would need a separate branch for every provider's failure modes. With it, you catch one set of exceptions and your retry and fallback logic stays uniform across all 100-plus providers the vendor reports supporting.
This is the difference between a gateway and a thin wrapper. A wrapper forwards your call; a gateway normalizes the request and the response so the rest of your stack can stay provider-agnostic.
Gateway Features (Proxy Server)
These are the capabilities that justify running the Proxy Server as shared infrastructure. They are governance and reliability controls, not model features, and they apply across every provider behind the gateway.
The Router brings the reliability subset (retries, fallbacks, and load balancing) into the SDK as well. The governance subset (virtual keys, spend tracking, and the admin dashboard) is proxy-only, because those only make sense when many applications share one control point.
Performance Numbers
A gateway adds a hop, so the fair question is how much latency that hop costs. The figures below come from BerriAI's own documentation and README. Treat them as vendor-reported: useful for a sense of scale, not a substitute for testing against your own workload.
The practical takeaway: the -stable Docker images are the ones BerriAI runs through a 12-hour load test before release, which is why they are the recommended tag for production. If single-digit-millisecond added latency holds on your hardware, the gateway hop is unlikely to be your bottleneck. Confirm it anyway under your own traffic profile.
Open Source vs Enterprise
Both the core SDK and the Proxy Server are open-source and free to self-host. You can run the full gateway, with virtual keys, spend tracking, guardrails, routing, and the admin dashboard, without paying for a license. For many teams that is the whole product.
A separate Enterprise tier, under a Commercial License, layers on the controls that larger organizations tend to require:
- SSO / SAML for single sign-on against your identity provider
- Audit logs for compliance and incident review
- Custom SLAs and feature prioritization
- Custom integrations built for your environment
- Dedicated support via Slack or Discord
Practically, that means you can prototype and even run production on the open-source tier, then move to Enterprise when an SSO requirement, an audit-log mandate, or an SLA need forces the conversation.
Install & Observability
Getting started depends on which form you want. The SDK is a single pip install. The Proxy Server adds an extras group, then runs from the command line or a Docker image.
With the proxy installed, start it from the command line and point it at a model:
For production, the -stable Docker image is the recommended tag, since it is the one put through the 12-hour load test:
In code, the SDK entry point is completion(), which mirrors the OpenAI client. The model string carries the provider, so switching providers is a string change rather than a code change:
For observability, LiteLLM ships single-line callback integrations with the tools most LLM teams already run:
Security Posture
A proxy that brokers every provider API key is a high-value target by design. If the gateway is compromised, the blast radius is every credential it holds. That context is the reason to treat a LiteLLM proxy with the same care you would give a secrets vault: least privilege, network isolation, scoped virtual keys, and prompt patching.
This breakdown does not enumerate individual advisories. For the full timeline, including the supply-chain package incident and the disclosed code vulnerabilities, read the dedicated LiteLLM security incident article in this cluster.
When to Use LiteLLM
LiteLLM is a strong default for multi-provider work, but it is not the answer to every situation. Here is an honest read on the fit.
For a head-to-head on the hosted-versus-self-hosted question, the OpenRouter vs LiteLLM comparison in this cluster works through the same decision in detail.
Frequently Asked Questions
What is LiteLLM used for?
LiteLLM is used to call many LLM providers through one OpenAI-compatible interface. Developers use the Python SDK for provider portability inside an application; platform teams run the Proxy Server as a shared AI Gateway that adds virtual keys, spend tracking, guardrails, routing, fallbacks, and an admin dashboard across more than 100 providers the vendor reports supporting.
Is LiteLLM free?
The core SDK and the Proxy Server are open-source and free to self-host. A separate Enterprise tier (Commercial License) adds SSO/SAML, audit logs, custom SLAs, feature prioritization, custom integrations, and dedicated support. Enterprise pricing is not published; BerriAI asks prospective customers to contact them directly. Verify current terms before purchasing.
What is the difference between the LiteLLM SDK and the Proxy Server?
The SDK is a library you import into one application. The Proxy Server is a standalone OpenAI-compatible service that many applications point at. The reliability features (retries, fallbacks, load balancing) are available in both via the Router. The governance features (virtual keys, spend tracking, admin dashboard) are proxy-only, because they require a shared control point.
How do you install LiteLLM?
Install the SDK with pip install litellm and the proxy with pip install 'litellm[proxy]'. Start the proxy with litellm --model
Does LiteLLM work with observability tools?
Yes. LiteLLM provides single-line callback integrations with Langfuse, MLflow, Helicone, and Lunary, so you can route tracing and cost data into whichever platform your team already uses.
Video Resources
Go Deeper
Resources from across Tech Jacks Solutions
Agent Frameworks Compared
How orchestration frameworks call models, and where a gateway fits
Agent Threat Landscape
Security risks when a single layer brokers credentials and traffic
PREMIUMPre-Deployment Safety Gate
27-point checklist before any AI tool goes live
IAPP AIGP Certification
The AI governance certification for privacy professionals