Best LLM Gateways in 2026: 5 Top AI Gateways Compared
An LLM gateway sits between your app and the dozen-plus model providers you might call, giving you one endpoint, one set of credentials, and a place to add routing, caching, guardrails, and spend tracking. This is a practitioner ranking of the five that matter most in 2026. The order is editorial, not a benchmark score, and we explain exactly how we weighed each one. Every gateway here ships with one honest limitation we did not hide.
The 2026 Rankings at a Glance
The table below summarizes all five gateways. Click any column header to sort. Each name links to its detailed section below. Counts and metrics are vendor-reported and labeled as such in each section.
| #▲ | Gateway▲▼ | Open Source▲▼ | Hosting▲▼ | Coverage (vendor-reported)▲▼ | Pricing Model▲▼ |
|---|---|---|---|---|---|
| 1 | LiteLLM | Yes (core) | Self-host / Enterprise | 100+ providers | Free OSS; Enterprise custom |
| 2 | OpenRouter | No | Managed | Hundreds of models (341 text) | Pay-as-you-go credits |
| 3 | Portkey | Partial | Self-host / Managed / VPC | 1,600+ models, 40-45+ providers | Free / $49 mo / Enterprise |
| 4 | Cloudflare AI Gateway | No | Managed (edge) | ~20+ providers | On all Cloudflare plans |
| 5 | Kong AI Gateway | Yes (both) | SaaS / self-host / hybrid / k8s | Universal API (plugin-based) | Plugin model; Stripe metering |
Order reflects editorial assessment across five criteria explained in the methodology below. No vendor paid for placement, and Tech Jacks Solutions uses no affiliate links.
#1 LiteLLM (BerriAI)
LiteLLM takes the top spot because it does the core job of a gateway, unifying many providers behind one interface, in the most portable way. It ships in two forms. The Python SDK is a drop-in replacement for the OpenAI client, exposing functions like completion(), embedding(), and image_generation() plus a Router with retries, fallbacks, and load balancing. The Proxy Server is a self-hosted, OpenAI-compatible central service with virtual keys, per-key and per-team budgets, spend tracking, guardrails, and an admin dashboard.
The standout strength is normalization. Every response follows the OpenAI Chat Completions format regardless of which provider answered, and each provider's errors are mapped to OpenAI exception types, so your application code stops caring which model it is talking to. Coverage is vendor-reported at 100+ LLMs and providers, including OpenAI, Anthropic, Gemini, Vertex AI, Bedrock, Azure, and HuggingFace. Vendor-reported performance figures cite 8ms P95 latency at 1,000 requests per second, with the proxy handling 1,500+ requests per second in load tests and -stable Docker images going through 12-hour load tests. Observability hooks into Langfuse, MLflow, Helicone, and Lunary as single-line callbacks.
The core SDK and proxy are open-source and free to self-host. The Enterprise commercial license adds SSO and SAML, audit logs, custom SLAs, feature prioritization, and dedicated support; that pricing is not public and is handled through a sales conversation.
Who should use it: Developers and ML platform teams that want a self-hosted, provider-agnostic layer they fully control, without vendor lock-in to a hosted aggregator.
LiteLLM carries a documented security history. There are eight official BerriAI/litellm GitHub Security Advisories from 2026, including a Critical SQL injection in proxy API key verification, an authentication bypass via host header injection, and an OIDC userinfo cache key collision auth bypass. Separately, a March 25, 2026 supply-chain incident saw two published versions containing credential-harvesting malware. The proxy brokers every provider API key by design, which makes it a high-value target, so a compromise can mean broad credential exposure. The maintainers run responsible disclosure and ship fixes in version bumps, so treat it as an active security process rather than a dealbreaker: pin to -stable releases, upgrade past fixed versions, network-isolate the proxy, scope virtual keys, rotate keys, and verify package integrity.
#2 OpenRouter
OpenRouter ranks second because it delivers the widest instant catalog with zero infrastructure to run. It is a hosted aggregator: one endpoint gives you a unified API to hundreds of models, with automatic fallbacks on 5xx or rate-limit errors and automatic cost-effective routing that picks the least expensive capable backend. The vendor-reported catalog as of June 9, 2026 breaks down to 341 text models, 32 image, 26 embeddings, 14 video, 10 transcription, 9 speech, 4 audio, and 3 rerank.
There are five ways in, which makes adoption easy: the native OpenRouter API, typed client SDKs, an Agent SDK for multi-turn loops with tools and state, the standard OpenAI SDK pointed at OpenRouter by changing the base URL, and various third-party SDKs. Pricing is a pay-as-you-go credit system computed on each model's native tokenizer, with dynamic pricing for multimodal work. Many models are free at $0, and paid models mirror the provider's own cost; for example, Claude Opus 4.8 is listed at $5 per million input tokens and $25 per million output, with an Opus 4.8 Fast variant at $10 and $50.
Who should use it: Developers who want to try and ship across a large model catalog immediately, without standing up and maintaining a gateway of their own.
Privacy is fragmented per model, not uniform. Zero Data Retention is a platform feature that is enabled for some models (for example Relace Apply 3 and Morph V3), while other models explicitly warn that prompts and completions may be logged by the provider (for example Owl Alpha). There is no blanket data-retention guarantee across the catalog, so for sensitive workloads you must check each model's policy individually rather than trusting the platform as a whole.
#3 Portkey
Portkey ranks third because it is the most complete production control plane of the group, even though that completeness costs some of the simplicity that lifts the top two. It describes itself as an end-to-end LLM orchestration platform built on five pillars: an AI Gateway, Observability, Guardrails, Governance, and Prompt Management. Coverage is vendor-reported at 1,600+ language, vision, audio, and image models across 40 to 45+ providers, with 50+ pre-built guardrails and compliance claims spanning SOC 2, HIPAA, GDPR, and CCPA. The open-source gateway is vendor-reported at under 1ms latency, a 122kb footprint, and 10 billion-plus tokens per day, and the core enterprise gateway is merging into open source in Gateway 2.0 (pre-release).
Pricing, verified June 9, 2026, runs across three tiers. The Developer plan is Free Forever with 10,000 recorded logs per month and 3-day log retention (30-day metrics). Production is $49 per month with 100,000 logs per month, 30-day log retention, 90-day metrics, and an extra $9 per additional 100,000 requests. Enterprise is custom and adds SSO, VPC hosting, custom retention, HIPAA and SOC 2, RBAC, and dedicated support. It is worth pointing out that Portkey's own site states Palo Alto Networks has completed its acquisition of Portkey.
Who should use it: Teams putting GenAI into production who want gateway, observability, guardrails, and governance from a single control plane rather than stitching tools together.
On the Free Forever tier, logs are retained for only 3 days, so investigating or auditing anything older than that requires a paid plan. Portkey's own documentation is also internally inconsistent on scale, citing 1,600+ models in some places and over 250 in others, so treat the headline coverage number as vendor-reported and verify the specific providers you need.
#4 Cloudflare AI Gateway
Cloudflare AI Gateway ranks fourth because it has the lowest adoption friction of any option here, at the cost of governance depth that is still maturing. It is a proxy that runs on Cloudflare's global edge and is available on all Cloudflare plans. You add it with one line of code and immediately get caching, rate limiting, analytics and logging, retries, and model fallback. It lists roughly 20+ providers, including OpenAI, Anthropic, Google, HuggingFace, Replicate, and Cartesia, and integrates with Workers AI and Vectorize for teams building on Cloudflare's platform.
The pitch is proximity and simplicity. If your application already runs on Cloudflare Workers, the gateway sits exactly where your requests already flow, so you are not adding a new network hop or a separate service to operate. That makes it the easiest on-ramp for edge and Workers AI builders who want observability and resilience without standing up infrastructure.
Who should use it: Developers already on Cloudflare, especially Workers AI users, who want caching, analytics, and fallback with almost no setup.
Several of the advanced governance features, including data loss prevention, guardrails, dynamic routing, and spend limits, are still in Beta. Teams that need mature, production-grade governance controls today should verify current availability and stability before depending on those specific capabilities.
#5 Kong AI Gateway
Kong AI Gateway ranks fifth not because it is weak, but because its strength is conditional: it shines if you already run Kong, and it is a heavier lift if you do not. It is a connectivity and governance layer built on Kong Gateway, and it works through a plugin model rather than custom code. It exposes a universal API and supports load balancing across consistent-hash, lowest-latency, usage-based, round-robin, and semantic strategies (the last added in v3.10+).
Its governance features are genuinely deep. The PII sanitization plugin covers 20 categories across 9 languages, and there is a RAG injector plus a prompt compressor and decorator, all configured as plugins. Metering and billing run through Stripe. Hosting is flexible, spanning Konnect SaaS, self-hosted, hybrid, DB-less, and Kubernetes deployments, which makes it a natural fit for platform teams that already standardize their API traffic on Kong.
Who should use it: Platform and enterprise teams already running Kong Gateway who want AI governance as configuration inside their existing API infrastructure.
Kong AI Gateway requires adopting Kong's broader ecosystem, including tools like decK, Konnect, and the Ingress Controller, and its AI plugins lean enterprise. Teams that are not already on Kong take on meaningful ecosystem lock-in and operational overhead to get the benefits, which is why it sits last for general adoption despite its capability.
How We Ranked These Gateways
This is an editorial ranking, not a benchmark leaderboard. We did not run a single standardized test across all five because they target different deployment contexts: a self-hosted proxy, a hosted aggregator, a full control plane, an edge service, and an API-platform plugin. Instead, we weighed five criteria:
- Breadth of provider and model coverage: How many providers and models you can reach through one interface.
- Open-source availability: Whether you can self-host and inspect the code, versus depending on a managed service.
- Production controls and observability: Routing, fallbacks, caching, guardrails, spend tracking, logging, and analytics.
- Ease of adoption: How quickly a team can get value, from one line of code to standing up a proxy.
- Governance and security maturity: The depth and production-readiness of access control, data handling, and compliance features, including documented security history.
Why this order: LiteLLM leads on coverage plus open-source portability and production controls, which is why its security history is a caution rather than a disqualifier. OpenRouter follows for the widest instant catalog with the easiest adoption, held back by fragmented per-model privacy. Portkey offers the deepest controls but more setup. Cloudflare is the simplest on-ramp but has beta governance. Kong is the strongest fit for existing Kong shops but carries ecosystem lock-in for everyone else. A different team weighting these criteria differently could reasonably reorder this list.
No vendor paid for inclusion or placement, and Tech Jacks Solutions has no affiliate relationships with any gateway listed. All facts were verified from official vendor documentation as of June 9, 2026.
Which Gateway Should You Pick?
Match the gateway to your situation rather than chasing the highest rank:
- You want to self-host and stay provider-agnostic: Start with LiteLLM. Run the proxy behind your own network controls and follow the security hygiene steps above.
- You want the biggest catalog with no infrastructure: Choose OpenRouter, and check each model's data-retention policy before sending anything sensitive.
- You are putting GenAI into production and want one control plane: Portkey gives you gateway, observability, guardrails, and governance together; budget for a paid tier if you need more than 3 days of log retention.
- You already build on Cloudflare: Cloudflare AI Gateway is the lowest-friction option, with the caveat that some governance features are still in beta.
- You already run Kong for your APIs: Kong AI Gateway adds AI governance as configuration inside infrastructure you already operate.
Frequently Asked Questions
What is an LLM gateway?
An LLM gateway is a proxy or control layer that sits between your application and many LLM provider APIs. It exposes a single, usually OpenAI-compatible endpoint and adds AI-specific controls such as routing, fallbacks, caching, observability, guardrails, and spend tracking. It is also called an AI gateway or model router.
Which LLM gateways are open-source?
LiteLLM and Kong AI Gateway are open-source and self-hostable. Portkey is partial: it has an open-source gateway, with the enterprise gateway merging into open source in Gateway 2.0 (pre-release). OpenRouter and Cloudflare AI Gateway are managed services and are not open-source.
What is the best LLM gateway for production?
It depends on context. LiteLLM leads for self-hosted unified access across 100+ providers. Portkey offers the most complete production control plane. Kong fits teams already on Kong Gateway. Cloudflare fits edge and Workers AI builders. OpenRouter fits developers who want instant catalog access without infrastructure.
How much does an LLM gateway cost?
LiteLLM and Kong have free open-source cores with custom-priced enterprise tiers. OpenRouter is pay-as-you-go credits that mirror provider cost, with many free models. Portkey has a free tier (10k logs/mo, 3-day retention), a $49/mo Production tier, and custom Enterprise. Cloudflare AI Gateway is available on all Cloudflare plans. Verify current pricing on each vendor's page before purchasing.
Is LiteLLM safe to use after its security advisories?
LiteLLM has a documented security history, including eight GitHub Security Advisories in 2026 and a March 2026 supply-chain incident where two published versions contained credential-harvesting malware. Its maintainers run responsible disclosure and ship fixes in version bumps. To use it safely, pin to -stable releases, upgrade past fixed versions, network-isolate the proxy, scope virtual keys, rotate keys, and verify package integrity.
Video Resources
Go Deeper
Resources from across Tech Jacks Solutions
What Is Agentic AI?
Understand the architecture behind autonomous AI agents and tool use
FREEAI Governance Charter
Establish your organization's AI principles in one document
AI Governance Hub
Build a responsible AI program for your organization
EU AI Act Guide
Check your compliance obligations under the EU AI Act