21 Zero-Days in FFmpeg: $1k Agent Outperforms Anthropic's Mythos

Depthfirst's autonomous security agent discovered 21 zero-day vulnerabilities in FFmpeg, including 8 CVEs, for a total compute cost of roughly $1,000 — 10% of what Anthropic spent using its Mythos model. The bugs span components from the TS demuxer to the VP9 decoder, with several latent for 15-20 years. One bug, a heap buffer overflow in the AV1 RTP depacketizer, is exploitable for remote code execution via a single 183-byte RTP packet.

The Security Agent Approach

Depthfirst built a specialized security agent that differs from typical coding agents. It starts with threat modeling: understanding FFmpeg's architecture, identifying exposed parsers and protocol handlers, and mapping attacker-controlled input entry points. Then it audits the attack surface, following data flow through relevant components. Crucially, it validates findings by producing reproducible, concrete inputs that trigger the vulnerability via execution, eliminating false positives. The agent costs $1k per full scan vs. Anthropic's $10k for Mythos.

The Vulnerabilities

Eight issues have CVEs assigned:

  • CVE-2026-39210 (Heap Buffer Overflow): TS demuxer, introduced 2010, missing length bounds check.
  • CVE-2026-39211 (Integer Overflow): swscale refactor, 2010, no upper bound on size factor.
  • CVE-2026-39212 (Stack Overflow): ffmpeg_opt.c, regression from July 2025, recursive option parsing without depth limit.
  • CVE-2026-39213 (Heap Buffer Overflow): yuv4mpegenc rawvideo input, 2023, no dimension validation.
  • CVE-2026-39214 (Stack Buffer Overflow): SDT implementation, 2003, writes service entries without tracking space — 23 years latent.
  • CVE-2026-39215 (Heap Buffer Overflow): update_mb_info(), 2012, logic error writes 12 bytes past buffer.
  • CVE-2026-39216 (Heap Buffer Overflow): img2enc.c, 2012, unsafe chroma size.
  • CVE-2026-39217 (Heap Buffer Overflow): VP9 decoder, regression from March 2025, missing realloc.
  • CVE-2026-39218 (Heap Buffer Overflow): DASH demuxer, 2017, negative duration leads to negative array index.

Remaining 12 issues (internal IDs DFVULN-127 through DFVULN-116) include heap buffer overflows, stack buffer overflows, integer overflows, and underflows across RTP depacketizers, demuxers, and the option parser. Details are in the full source.

Deep Dive: AV1 RTP Depacketizer RCE

One standout bug: a heap buffer overflow in libavformat/rtpdec_av1.c, reachable from the network with no special flags. A victim only needs to run ffmpeg -i rtsp://attacker/stream. A single 183-byte packet redirects execution.

The root cause is in the handling of AV1's Temporal Delimiter (TD) OBU. The depacketizer builds output incrementally, with pktpos tracking the next write position. For every byte emitted, av_grow_packet enlarges the heap buffer. The invariant: pktpos must never exceed allocated size. The TD handling breaks this:

// libavformat/rtpdec_av1.c:250
if ((obu_type == AV1_OBU_TEMPORAL_DELIMITER) ||
    (obu_type == AV1_OBU_... (truncated)

The code skips the TD OBU without calling av_grow_packet, but pktpos is not adjusted. Subsequent OBU writes then overflow the buffer. Depthfirst's agent identified this exact code path and produced a PoC achieving PC control.

Why It Matters

FFmpeg processes media in browsers, streaming platforms, and countless applications. These vulnerabilities are zero-click: just parsing a malformed media file or stream can trigger exploitation. The fact that an autonomous agent costing $1k found bugs missed by Google's Big Sleep team and Anthropic's Mythos model signals a shift in vulnerability research. Developers should update FFmpeg immediately and consider integrating similar automated security analysis into their CI/CD pipelines.

Editor's Take

I've used FFmpeg for years in video processing pipelines, and honestly, the sheer attack surface of 1.5 million lines of C parsing untrusted media has always kept me up at night. What impresses me here is the cost efficiency: $1k vs $10k for Anthropic's Mythos. That's an order of magnitude cheaper, and it found bugs that humans and other AI missed. I'm skeptical that autonomous agents will replace manual audits entirely, but for finding low-hanging fruit in massive codebases, this is a no-brainer. If you maintain a C/C++ project processing untrusted input, start experimenting with security agents now.