A successful attack gives an adversary full control of the server running your AI inference workload — including access to model weights, inference data, API keys, and any connected internal systems. Organizations using SGLang to serve AI models in production face potential theft of proprietary models (which represent significant R&D investment), exposure of data processed through the inference pipeline, and lateral movement into broader infrastructure. Because no patch exists and the attack requires only that an operator load a poisoned model file, the risk window is open-ended until SGLang issues a fix or the endpoint is isolated.
You Are Affected If
You run SGLang in any version currently available — no patched version exists as of disclosure
Your SGLang instance is internet-facing or accessible from untrusted networks without firewall or WAF controls blocking the /v1/rerank endpoint
Your team loads GGUF model files sourced from public repositories (Hugging Face, community model hubs) without cryptographic integrity verification
You have not restricted or disabled the /v1/rerank endpoint as a compensating control
You also run llama_cpp_python or vLLM and have not patched CVE-2024-34359 or CVE-2025-61620 — those share the same attack class and may also be unmitigated
Board Talking Points
A critical security flaw in AI server software we may be running allows an attacker to take full control of that server with no login required — simply by supplying a corrupted AI model file.
Security teams should immediately isolate affected AI inference systems from the internet and block the vulnerable feature until the software vendor releases a fix, which does not yet exist.
Without action, an attacker could steal proprietary AI models, access sensitive data processed by those systems, and use the compromised server as a foothold into our broader infrastructure.