Russia's APT28 Is Deploying LLM-Powered Malware That Generates Its Own Attack Commands

Google's Threat Intelligence Group confirmed that APT28, the GRU-linked hacking group also known as Fancy Bear, deployed a data-mining tool called PROMPTSTEAL in Ukraine operations. PROMPTSTEAL does not carry a static attack payload. Instead, it queries the Hugging Face API to access the Qwen2.5-Coder large language model, receives dynamically generated Windows commands tailored to the specific target environment, and executes them. The attack logic lives in the LLM response, not in the tool itself.

APT28 has been active since at least 2007 and is attributed to Russia's GRU military intelligence directorate. The group is responsible for high-profile intrusions including the 2016 DNC breach, multiple NATO member network intrusions, and sustained campaigns against Ukrainian government and military infrastructure. PROMPTSTEAL represents a significant capability evolution: the group has moved from carrying pre-built tooling to deploying tools that make the model itself the attack logic.

How PROMPTSTEAL Operates

PROMPTSTEAL establishes a foothold on a target Windows system through methods consistent with APT28's existing initial access capabilities: spearphishing, credential abuse, and exploitation of internet-facing services. Once resident, the tool conducts an environmental survey, collecting information about the system configuration, installed software, domain membership, and accessible network resources.

This environmental context is packaged into a structured prompt and sent to the Qwen2.5-Coder model via the Hugging Face inference API. The prompt requests Windows command sequences optimized for credential collection and data exfiltration given the specific environment described. Qwen2.5-Coder, a code-generation model, returns syntactically valid, contextually appropriate commands. PROMPTSTEAL executes them.

The result is attack behavior that is adapted to each target individually. Two deployments of PROMPTSTEAL in two different environments will execute substantially different command sequences, because the LLM generates commands based on what it is told about the environment rather than running a fixed script. This breaks detection approaches that rely on matching known command sequences, tool signatures, or behavioral patterns associated with specific APT tooling.

Why This Defeats Static Behavioral Rules

SIEM and EDR detection rules for APT28 are built from observed TTPs: specific commands, process chains, file paths, registry keys. PROMPTSTEAL generates novel command sequences for each target. The rule library that worked yesterday describes commands this tool will never run again.

PROMPTSTEAL Execution Flow

Initial access

APT28 gains foothold via spearphishing or credential abuse; PROMPTSTEAL deployed to compromised host

Environment enumeration

Tool collects system config, domain info, installed software, and network topology details

LLM query via Hugging Face

Environmental context sent to Qwen2.5-Coder API; model returns target-specific credential collection and exfil commands

Dynamic command execution

PROMPTSTEAL executes returned commands; no fixed payload, no static signature, novel sequence every deployment

Data exfiltration

Credentials and target documents staged and exfiltrated using commands generated for the specific environment

The Choice of Qwen2.5-Coder and Hugging Face

The model selection is deliberate. Qwen2.5-Coder is a code-focused LLM developed by Alibaba's Qwen team and distributed through Hugging Face. It is optimized for generating syntactically correct, functionally effective code and command sequences. It has fewer content restrictions than commercial Western AI products against generating system administration commands, making it more cooperative for the specific use case of producing Windows command sequences against credential stores.

Hugging Face provides API access to thousands of open-weight models without the usage monitoring and abuse detection that commercial providers like Anthropic and Google have built into their flagship products. APT28's use of Hugging Face rather than a commercial API reflects awareness of the monitoring risk at commercial providers: Anthropic detected and terminated GTG-2002's accounts when they ran the extortion campaign described elsewhere in this report. Hugging Face presents a lower detection surface for API-based misuse.

Implications for Nation-State Threat Modeling

The shift PROMPTSTEAL represents is not incremental. APT28 has historically used sophisticated custom tooling: SOFACY, X-Agent, Zebrocy, and others, each built with significant development investment and representing hard-coded attack logic. PROMPTSTEAL is architecturally different: the tool is a thin wrapper around an LLM API call. The attack intelligence lives in the model, not the binary.

This architecture has several advantages for the attacker beyond evasion. It requires less development time per target. It adapts automatically to environments the developer did not anticipate. It is easier to update by adjusting prompts than by rewriting and redistributing binaries. And it moves the arms-race dynamic: defenders can burn a specific binary, but they cannot burn the underlying model capability.

For defenders, the detection pivot is to network telemetry. PROMPTSTEAL generates outbound HTTPS traffic to Hugging Face inference endpoints from host processes. This is anomalous in most enterprise environments and in essentially all operational military and government networks. Egress filtering that blocks or alerts on connections to model inference APIs from endpoint processes is an effective first-layer control, and one that is environment-agnostic regardless of which model the attacker uses next.

Nation-state tooling is adapting faster than static detection rules can follow.

RedEye Security builds threat models calibrated to current APT capabilities and identifies the network and behavioral controls that remain effective against dynamically generated attack logic.

Talk to us