The Problem: Context Decay in Long-Running Agents
Every developer who builds a long-running AI agent hits the same wall. At turn 10, the agent is sharp — remembers exactly what you asked, threads the needle on a tricky bug. At turn 80, it re-suggests a fix you already rejected, forgets the file it edited twenty turns ago, contradicts decisions made at the start. The model hasn't changed; the prompt hasn't changed. The desk has filled up.
As the article puts it: "the context window is RAM, not a hard drive." Everything the model can reason about has to fit on the desk. The long conversation is what happens when you keep piling paper on that desk and never clear any of it. Eventually the important sheet is buried under four hundred lines of tool output.
What Compaction Is (and Isn't)
Compaction is lossy compression for meaning. You take a long, sprawling history and replace it with a shorter representation that preserves what matters for the next step and discards what doesn't. The "lossy" part is the whole game — if it were lossless, you'd just have the same tokens in a different font.
Two things compaction is not:
- Clearing is amnesia.
/clearwipes history completely — useful when switching tasks, catastrophic mid-task. - Trimming is mechanical. Dropping the oldest N messages by a hard rule, no model involved. Cheap, fast, dumb. It doesn't know that the oldest message might be the one decision the whole task hinges on.
Compaction sits between them: smarter than trimming, less destructive than clearing. It uses a model to decide what's worth keeping and rewrites the rest into something dense.
Why "Just Summarize It" Goes Wrong
Naive compaction fails in ways that are worth naming. Drew Breunig's taxonomy of long context rot applies directly:
- Poisoning: a hallucination makes it into the summary and becomes load-bearing fact. The model treats its own earlier mistake as established truth. Compaction can launder a guess into a "decision."
- Distraction: the summary keeps so much that it recreates the original bloat. You compacted 50K tokens down to 48K.
- Confusion: superfluous detail survives the cut and pulls the model toward irrelevant work.
- Clash: the summary and live messages disagree, and the model has to referee two versions of reality.
Then there's the failure mode unique to repeated compaction: cumulative erosion. Each compaction is lossy. Compact a compaction and you lose a little more. After five compactions across a marathon session, the agent works from a summary of a summary of a summary. The specific constraint from turn 3 ("never touch the auth module") is three generations of paraphrase away from the original. This is why agents "go off the rails" after an auto-compact fires mid-task: the boundary isn't just a token reduction — it's a behavioral discontinuity.
The Anatomy of a Good Compaction
The heuristic that holds up across serious implementations: keep the decisions and the state, discard the process that produced them.
From the article: "A good compaction keeps: 'The user's table is documents_v2.' A good compaction discards: the 400 lines of JSON the model waded through to figure that out." The fact is durable and tiny. The evidence for the fact is enormous and now useless — you already extracted the value.
Generalized checklist:
| Keep | Drop |
|---|---|
| What was decided, and the why | The deliberation that led to the decision |
| Current state of files/system | Intermediate states already overwritten |
| Explicit user constraints and preferences | Pleasantries, acknowledgements, retries |
| What's in progress right now | Completed-and-verified subtasks (one line each) |
| Concrete next steps | Raw tool output you've already digested |
| Critical references (table names, IDs, paths) | The search that found them |
The constraints line deserves emphasis: user constraints are the most expensive thing to lose and the easiest to drop. "Decided to use Postgres" looks like a fact worth keeping. "User said never to touch the auth module" looks like old conversational noise. The second is the one that, when lost, makes your agent confidently do the one thing it was told not to. Pin constraints. They should survive every compaction unchanged, never paraphrased.
How Production Tools Do It
Two coding agents illustrate different approaches:
Claude Code leans on automation. It has a manual /compact command, but the headline behavior is auto-compaction firing at roughly 95% of the context window. It summarizes the full trajectory and starts fresh with that summary as seed. You can steer it (/compact "focus on the open TODOs"). Community consensus: 95% is too late — by the time you're that full, the conversation's already degrading. Developers compact manually well before the trigger.
OpenAI's Codex CLI is more "handoff"-flavored. Its prompt frames compaction as a checkpoint producing a summary for "another LLM that will resume the task," and tells that next model to build on the work rather than redo it. It triggers on a token threshold, keeps the most recent user messages verbatim alongside the summary, and has retry-with-backoff for when the compaction call itself fails.
Building Your Own Compaction
If you're implementing compaction in your agent, the article provides concrete guidance:
- Trigger earlier than you think. 85–90% of context window lets you compact while the context is still good enough to summarize well.
- Prune before you summarize. Run a cheap mechanical pass that drops stale tool output first. Don't spend your expensive LLM call on 400 lines of JSON you can just delete.
- Keep recent turns verbatim. Compact the distant past; preserve the present. The model needs the live, un-paraphrased thread of what's happening right now.
- Pin constraints, and tell the user it happened. Carry user constraints through every compaction as exact text. A silent compaction that changes behavior mid-task feels like the agent randomly got worse. One line ("compacted history to free up context") turns a mystery into a tradeoff.
Here's a starting prompt provided in the article:
Create a handoff summary so this coding session can continue in a fresh context.
The summary will be the ONLY history available, so preserve:
1. Completed work — what's done and verified (one line each)
2. Current state — files modified and their status
3. In progress — what is being worked on right now
4. Next steps — concrete actions to take
5. Constraints — user preferences and requirements, quoted exactly
6. Critical references — table names, IDs, file paths, key decisions and the why
Be dense. Drop deliberation, raw tool output, and anything already superseded.
Do not invent or assume anything not present in the conversation.
That last line — "do not invent" — is your cheapest defense against poisoning. The summarizer's job is compression, not creativity.
The Uncomfortable Part
Compaction forces you to admit something the rest of the stack lets you avoid: you cannot keep everything, so you have to decide what your agent is allowed to forget. External memory and retrieval let you dodge that — stash it, pull it back later. But inside a single long-running task, on a finite desk, there's no dodge left. Something has to go. Compaction is choosing what survives on purpose, instead of letting the context window choose for you by quietly shoving your most important constraint off the edge of attention.
The best compaction, like the best engineering, is mostly subtraction. It's keeping documents_v2 and burning the JSON, one line of constraint outliving a thousand lines of chatter. The mantra: the best token is the one you didn't have to send. Compaction is how you find out which tokens those were, and have the nerve to delete them.
What To Do Now
If you're building an AI agent or coding assistant, audit your context management strategy. Are you relying on naive trimming or full clearing? Implement compaction with the checklist above. Start with a manual /compact command at 85% context usage, pin user constraints verbatim, and always log when compaction fires. Your agent at turn 80 will thank you.





