Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

LLM Gateways

What Is LiteLLM? The Open-Source LLM Gateway Explained

LiteLLM is an open-source LLM gateway that gives you one interface to more than 100 model providers. Instead of writing separate integration code for OpenAI, Anthropic, Google Gemini, Vertex AI, Bedrock, Azure, and HuggingFace, you call one OpenAI-compatible API and let LiteLLM translate. It is maintained by BerriAI, and the vendor reports coverage of 100+ providers.

That single sentence hides a useful detail: LiteLLM is really two products that share a name. There is a Python SDK you import into your code, and there is a Proxy Server, the piece BerriAI calls the AI Gateway, that you stand up as a shared service. Knowing which one you need is the first decision, so this breakdown starts there and works through translation, gateway features, performance, the open-source versus Enterprise split, install, and security posture.


100+
Providers (vendor-reported)
2
Forms: SDK + Proxy
8ms
P95 @ 1k RPS (vendor-reported)
OSS
SDK + Proxy free to self-host

What Is LiteLLM?

LiteLLM is not a model. It does not generate text. It sits between your application and the model providers and normalizes everything: the request format, the response format, and even the error types. You write your code once against an OpenAI-style interface, and LiteLLM handles the per-provider differences underneath.

The practical payoff is that switching models stops being a rewrite. If you start on one provider and later move to another, your calling code does not change. The same logic applies when you add a second provider as a fallback or split traffic across several for load balancing. LiteLLM centralizes that plumbing so each application does not reimplement it.

Practitioner note: The thing people miss is that LiteLLM is two distinct deployment shapes. As a library, it lives inside one application. As a proxy, it becomes shared infrastructure that many applications point at. The features you get differ between the two, so pick the shape before you pick the feature list.


The SDK vs the Proxy Server

This is the distinction that determines everything else. The Python SDK and the Proxy Server solve related problems for different audiences.

Python SDK
Drop-in OpenAI client replacement for developers
Calls completion()
Also embedding(), image_generation()
Router retry, fallback, load balance
Proxy Server
Self-hosted AI Gateway for platform teams
Shape Central service
Endpoint OpenAI-compatible
Adds Keys, spend, guardrails, UI

Use the SDK when you are a developer building a single application and you want provider portability without standing up extra infrastructure. You import LiteLLM, swap your client for its completion(), embedding(), and image_generation() calls, and use the Router for retries, fallbacks, and load balancing.

Use the Proxy Server when you are a platform or GenAI enablement team serving many internal applications. You run it as a central OpenAI-compatible service, then every team points at it. That is where the governance features live: virtual keys, spend tracking, guardrails, and an admin dashboard. Those features do not exist in the SDK because the SDK has no central place to enforce them.


OpenAI-Format Translation

The mechanism that makes the gateway useful is format translation. Every response that comes back through LiteLLM follows the OpenAI Chat Completions format, no matter which provider produced it. Your output parser sees the same shape whether the call went to Anthropic, Gemini, or Bedrock.

LiteLLM also maps each provider's errors onto the OpenAI exception types. That detail matters more than it sounds. Without it, your error handling would need a separate branch for every provider's failure modes. With it, you catch one set of exceptions and your retry and fallback logic stays uniform across all 100-plus providers the vendor reports supporting.

1 format
Every response follows the OpenAI Chat Completions format regardless of provider, and provider errors are mapped to OpenAI exception types. That is what makes drop-in compatibility real rather than aspirational.

This is the difference between a gateway and a thin wrapper. A wrapper forwards your call; a gateway normalizes the request and the response so the rest of your stack can stay provider-agnostic.


Gateway Features (Proxy Server)

These are the capabilities that justify running the Proxy Server as shared infrastructure. They are governance and reliability controls, not model features, and they apply across every provider behind the gateway.

Virtual Keys
Issue per-key, per-team, and per-user keys, each with its own budget. Teams get access without sharing raw provider credentials.
Spend Tracking
Attribute spend back to keys, teams, and users so cost is visible and attributable rather than a single opaque provider bill.
Guardrails
Apply content filtering and PII masking at the gateway, so the policy is enforced centrally rather than reimplemented in each app.
Load Balancing
Distribute traffic across multiple deployments or providers to smooth out rate limits and capacity pressure.
Routing & Fallbacks
Route requests across models and fall back to an alternate when a provider fails, keeping requests served during an outage.
Retries
Retry transient failures automatically, so a single flaky call does not surface as an error to the calling application.
Admin Dashboard
A built-in UI to manage keys, view spend, and configure the proxy without editing config files by hand for every change.

The Router brings the reliability subset (retries, fallbacks, and load balancing) into the SDK as well. The governance subset (virtual keys, spend tracking, and the admin dashboard) is proxy-only, because those only make sense when many applications share one control point.


Performance Numbers

A gateway adds a hop, so the fair question is how much latency that hop costs. The figures below come from BerriAI's own documentation and README. Treat them as vendor-reported: useful for a sense of scale, not a substitute for testing against your own workload.

8ms P95
Added latency at 1,000 requests per second
Source Vendor-reported
1.5k+ RPS
Throughput the proxy handles in load tests
Source Vendor-reported
12 hr
Load test duration for -stable images
Source Vendor-reported

The practical takeaway: the -stable Docker images are the ones BerriAI runs through a 12-hour load test before release, which is why they are the recommended tag for production. If single-digit-millisecond added latency holds on your hardware, the gateway hop is unlikely to be your bottleneck. Confirm it anyway under your own traffic profile.


Open Source vs Enterprise

Both the core SDK and the Proxy Server are open-source and free to self-host. You can run the full gateway, with virtual keys, spend tracking, guardrails, routing, and the admin dashboard, without paying for a license. For many teams that is the whole product.

A separate Enterprise tier, under a Commercial License, layers on the controls that larger organizations tend to require:

  • SSO / SAML for single sign-on against your identity provider
  • Audit logs for compliance and incident review
  • Custom SLAs and feature prioritization
  • Custom integrations built for your environment
  • Dedicated support via Slack or Discord
Enterprise pricing is not public
BerriAI does not publish Enterprise pricing. The documented path is to contact the maintainers directly to talk through terms and schedule a demo. Budget accordingly and verify current terms before committing.

Practically, that means you can prototype and even run production on the open-source tier, then move to Enterprise when an SSO requirement, an audit-log mandate, or an SLA need forces the conversation.


Install & Observability

Getting started depends on which form you want. The SDK is a single pip install. The Proxy Server adds an extras group, then runs from the command line or a Docker image.

Shell: install the SDKpip install litellm
Shell: install the Proxy Serverpip install 'litellm[proxy]'

With the proxy installed, start it from the command line and point it at a model:

Shell: start the proxylitellm --model

For production, the -stable Docker image is the recommended tag, since it is the one put through the 12-hour load test:

Shell: run the -stable Docker imagedocker run -e LITELLM_MASTER_KEY=sk-... -p 4000:4000 ghcr.io/berriai/litellm:main-stable

In code, the SDK entry point is completion(), which mirrors the OpenAI client. The model string carries the provider, so switching providers is a string change rather than a code change:

Python: SDK completion callfrom litellm import completion response = completion( model="anthropic/claude-opus-4-8", messages=[{"role": "user", "content": "Hello"}] )

For observability, LiteLLM ships single-line callback integrations with the tools most LLM teams already run:

Langfuse
Tracing and analytics for LLM calls, wired in as a callback.
MLflow
Experiment tracking and model lifecycle logging.
Helicone
Request logging, monitoring, and cost visibility.
Lunary
Analytics and observability for LLM applications.

Security Posture

A proxy that brokers every provider API key is a high-value target by design. If the gateway is compromised, the blast radius is every credential it holds. That context is the reason to treat a LiteLLM proxy with the same care you would give a secrets vault: least privilege, network isolation, scoped virtual keys, and prompt patching.

Credential aggregation risk
The proxy centralizes provider keys so applications never see raw credentials. That is a security benefit, but it concentrates risk in one place. Pin to -stable releases, isolate the proxy on the network, and scope virtual keys tightly.
Active disclosure process
The maintainers publish security advisories and ship fixes in version bumps. Keeping current with releases and watching the advisories is part of running the proxy responsibly.

This breakdown does not enumerate individual advisories. For the full timeline, including the supply-chain package incident and the disclosed code vulnerabilities, read the dedicated LiteLLM security incident article in this cluster.


When to Use LiteLLM

LiteLLM is a strong default for multi-provider work, but it is not the answer to every situation. Here is an honest read on the fit.

Reach for the SDK when...
You are a developer who wants provider portability inside one application and you would rather not run extra infrastructure. The Router covers retries, fallbacks, and load balancing without a separate service.
Run the Proxy when...
You are a platform team serving many internal apps and you need central key management, spend attribution, and policy enforcement. The governance features only exist at the proxy layer.
Look elsewhere when...
You want a fully hosted aggregator with no infrastructure to run, or an edge-native gateway. A hosted option like OpenRouter or a managed edge gateway may fit better than self-hosting.
Weigh the tradeoff when...
You need vendor-backed SLAs, SSO, and audit logs. The open-source tier may not cover them, which pushes you toward the Enterprise license and its non-public pricing.

For a head-to-head on the hosted-versus-self-hosted question, the OpenRouter vs LiteLLM comparison in this cluster works through the same decision in detail.


Frequently Asked Questions

What is LiteLLM used for?

LiteLLM is used to call many LLM providers through one OpenAI-compatible interface. Developers use the Python SDK for provider portability inside an application; platform teams run the Proxy Server as a shared AI Gateway that adds virtual keys, spend tracking, guardrails, routing, fallbacks, and an admin dashboard across more than 100 providers the vendor reports supporting.

Is LiteLLM free?

The core SDK and the Proxy Server are open-source and free to self-host. A separate Enterprise tier (Commercial License) adds SSO/SAML, audit logs, custom SLAs, feature prioritization, custom integrations, and dedicated support. Enterprise pricing is not published; BerriAI asks prospective customers to contact them directly. Verify current terms before purchasing.

What is the difference between the LiteLLM SDK and the Proxy Server?

The SDK is a library you import into one application. The Proxy Server is a standalone OpenAI-compatible service that many applications point at. The reliability features (retries, fallbacks, load balancing) are available in both via the Router. The governance features (virtual keys, spend tracking, admin dashboard) are proxy-only, because they require a shared control point.

How do you install LiteLLM?

Install the SDK with pip install litellm and the proxy with pip install 'litellm[proxy]'. Start the proxy with litellm --model from the command line, or run the -stable Docker image for production. The -stable images are put through a 12-hour load test before release.

Does LiteLLM work with observability tools?

Yes. LiteLLM provides single-line callback integrations with Langfuse, MLflow, Helicone, and Lunary, so you can route tracing and cost data into whichever platform your team already uses.

Fact-checked against vendor documentation and official sources, June 2026. Verify current pricing at docs.litellm.ai before purchasing.
LiteLLM is a project of BerriAI. OpenAI and GPT are trademarks of OpenAI. Claude is a trademark of Anthropic. Gemini and Vertex AI are trademarks of Google. Bedrock and AWS are trademarks of Amazon.com, Inc. Azure is a trademark of Microsoft. All other trademarks belong to their respective owners.
Before You Use AI
Your Privacy

LiteLLM is open-source software that runs in your own infrastructure. Your prompts and data flow to whichever LLM providers you configure behind the gateway, and each provider has its own data retention and training policy. Commercial API tiers generally do not train on your data; free tiers may. If you wire LiteLLM into an observability platform, that tool also receives your trace data. Review the data processing terms for every provider and integration in your routing path before sending sensitive data.

Mental Health & AI Dependency

A gateway like LiteLLM makes it easy to route many applications through a single AI layer, which can concentrate reliance on automated output. Keep human review on consequential decisions rather than trusting model responses by default. If you or someone you know is experiencing a mental health crisis:

  • 988 Suicide & Crisis Lifeline -- Call or text 988 (US)
  • SAMHSA Helpline -- 1-800-662-4357
  • Crisis Text Line -- Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

Your Rights & Our Transparency

Under GDPR and CCPA, you have the right to access, correct, and delete personal data held by any LLM provider you route through LiteLLM. Tech Jacks Solutions maintains editorial independence. This article was not sponsored, reviewed, or approved by BerriAI or any vendor mentioned. We receive no affiliate commissions from the providers linked here. The EU AI Act adds transparency and risk obligations for many AI systems. Our evaluations are based on primary documentation and verified data.