The Outage That Wasn't an Outage

On Friday afternoon, a government order hit Anthropic. By Saturday morning, Fable 5 and Mythos 5 were disabled for every customer worldwide. Not deprecated. Gone. Two days later, OpenAI shut down Sora because it was losing $15 million a day.

This isn't about politics. It's about what happens when your critical AI workflow loses its model with no SLA, no restore ETA, and no green status page. The author ran his most critical AI-dependent pipeline—a spec-to-task-breakdown system—on a backup model the next day. Here's what broke.

Break #1: Prompt Overfitting

The first failure was the prompt itself. The original prompt had drifted into a shape that worked beautifully on Model A: tight, terse, with lots of implicit structure the model learned to fill in. Model B read the same prompt and produced vague, unstructured mush.

- summarize the spec and break it into tasks
+ You are breaking a spec into engineering tasks.
+ Output JSON only, matching this shape:
+ { "tasks": [{ "title": "", "estimate_pts": 0, "depends_on": [] }] }
+ Rules:
+ - every task must be independently shippable
+ - no task larger than 3 points; split if larger
+ - depends_on references task titles, not indexes

Model A filled in all that structure on its own. Model B needed it spelled out. That's 20 minutes of restructuring that you don't want to do during an actual outage.

Break #2: Silent Tool-Call Dependency

The second break was invisible. One step in the pipeline depended on a tool call—a function the model invokes to pull live data. The backup model's tool-calling format was different enough that the call silently no-op'd. The output still looked plausible. It just used stale data and didn't tell you. That's the worst failure mode: confidently wrong, no error, no flag. The author only caught it because he was looking for trouble. On a normal day, that bad output flows downstream and someone makes a decision on it.

Availability Belongs on the Risk Register

We already handle APIs being down. A 503, back off, retry, it comes back. That's an outage with an SLA and a status page that eventually goes green. This is the model being gone. No SLA. No restore ETA. No green status page, because it isn't coming back. A policy order or a vendor's burn-rate review can end it overnight.

For a service you don't control and can't restore, that's a single point of failure on your critical path. You'd never ship that for a database. Most of us are shipping it for the model doing half the thinking.

The One-Pager That Deletes Your Worst Hour

The cheapest move is the most useful. The first hour after a model goes dark gets burned figuring out what just broke—which workflows touched that model, what versions, where the outputs live.

IBM found 88% of enterprises don't keep a complete inventory of the AI and agents they run. You can't reroute around a dead model if you don't know what depended on it. So write one file:

workflows:
  - name: spec-to-tasks
    model: primary-vendor/model-a
    criticality: must-survive
    fallback: tested 2026-06-13, prompt needs restructure
  - name: standup-digest
    model: primary-vendor/model-a
    criticality: can-wait
    fallback: none, recovery order documented
  - name: video-assets
    model: openai/sora
    criticality: can-wait
    export_path: download MP4s + project json before EOL

That last line is the Sora lesson. When a vendor kills a product, not just a model, you also have to ask where your outputs go and how you get them out. One extra column.

The Point Isn't Fear

Depending on these models is the right call. The teams that win aren't the ones who avoided the dependency. They're the ones who can keep the work moving the morning it disappears.

That competence costs an afternoon to build and almost nobody has built it yet:

  • Run your most critical workflow on a second model once. The rehearsal is the whole instrument.
  • Sort workflows into must-survive-today vs can-wait. Only the short list earns a tested fallback.
  • Keep a one-page workflow-to-model list so the first lost hour becomes a glance.

The author ran his test on a quiet Saturday and it cost him 20 minutes and a little ego. The alternative was running it for the first time on the morning it counted.

What would break first in your stack if your main model wasn't there tomorrow—and have you ever actually checked?