AFT Framework™ — Agentic Failure Taxonomy by Continuance

AFT-01

Hallucination Cascade

Output quality drops. Downstream systems act on wrong data. Damage compounds before anyone notices.

High severity

›

What happens

The agent produces confident-sounding outputs that are factually wrong. These get passed to downstream systems — databases, APIs, other agents, or customers — which act on them as if they were correct. By the time a human reviews the output, the cascading damage has already propagated.

Why it's hard to catch

Hallucinated outputs don't throw errors. They look exactly like correct outputs — same format, same confidence. There's nothing in your logs to indicate a problem. The only signal is semantic: the content is wrong. That requires continuous quality monitoring, not just error monitoring.

🛡 Continuance detects AFT-01 through continuous output quality monitoring and semantic consistency checks. Get covered →

AFT-02

Tool Loop Deadlock

The agent calls the same tool repeatedly, going nowhere. API costs spike. Requests time out.

High severity

›

What happens

The agent enters a retry cycle on a single tool call — invoking the same function repeatedly without making progress. The session hangs. API costs accumulate. Most teams discover this failure mode when they see an unusual spike in their LLM billing, often days or weeks after the fact.

Why it's hard to catch

From the outside, the agent appears to be "thinking." There's no exception, no timeout warning in most frameworks. The loop continues until an external rate limit or token budget is hit. By then, the cost is already incurred.

🛡 Continuance detects AFT-02 by identifying repetitive tool call signatures within trace windows. Get covered →

AFT-03

Context Collapse

Mid-session coherence loss. The agent loses the thread and starts responding off-topic or incoherently.

High severity

›

What happens

Partway through a session, the agent loses coherence. It begins responding to things that weren't asked, ignores prior context, or gives answers that bear no relation to the current conversation state. To the user it looks like the AI "broke" — an impression that is very difficult to undo.

Why it's hard to catch

Context collapse doesn't produce an error code. The agent continues to respond — it just responds badly. Catching it requires understanding the expected relationship between input and output across a session, which most logging systems don't track.

🛡 Continuance monitors session coherence continuity and detects context collapse before it escalates. Get covered →

AFT-04

Prompt Injection

Malicious input overrides the agent's system prompt. It starts acting outside its intended role.

Critical severity

›

What happens

A crafted user input instructs the agent to ignore its original instructions and take different actions. The agent complies — because it cannot inherently distinguish between trusted instructions and injected ones. Consequences range from data exposure to executing actions the agent was never authorized to perform.

Why it's hard to catch

The agent appears to be functioning normally — it's responding, completing tasks. The problem is what tasks it's completing. Without continuous monitoring of whether the agent's behavior stays within its intended scope, injections can run undetected for extended periods.

🛡 AFT-04 is a critical-severity failure. Continuance flags and contains injection events immediately. Get covered →

AFT-05

Silent Output Degradation

Gradual quality drift with no error, no crash, no alert. Whether from internal state drift or a silent upstream model update — users stop trusting your product before you know anything changed.

High severity — most missed

›

What happens

Over time — weeks, sometimes months — output quality gradually worsens. Responses become less relevant, less accurate, less useful. This happens in two forms: internal drift, where accumulated state, memory artifacts, or prompt entropy erode output quality; and model regression, where your LLM provider pushes a silent version update that changes behavior without any changelog. In both cases there's no single moment of failure. The API returns 200. Your agent is running. The results are quietly wrong.

Why it's hard to catch

AFT-05 has no error signal of any kind. It requires comparing current output quality against a historical behavioral baseline — not just checking whether the agent is alive. Internal drift won't appear in deployment logs. Model regressions won't appear in your version control. Teams typically discover both variants through customer complaints or churn data, often days or weeks after the regression began.

🛡 Sentinel baselines your agent's output patterns continuously — catching both internal drift and upstream model regressions within hours, not weeks. Get covered →

AFT-06

Dependency Cascade

An external API your agent depends on degrades or goes down. Your product fails. Your users blame you.

Medium severity

›

What happens

Your agent is functioning correctly — but a service it depends on (a search API, a database, a third-party tool) is degraded or unavailable. Your agent can't complete its task. From your users' perspective, your product stopped working. They don't know or care that the failure is upstream.

Why it's hard to catch

Dependency failures are often partial: the service is slow, returning errors on some requests but not others. This produces inconsistent behavior that's easy to misattribute to the agent itself. Without monitoring external dependencies alongside agent traces, the root cause is unclear.

🛡 Continuance monitors external dependency health alongside your agents so you know before your users do. Get covered →

AFT-07

Authorization Drift

The agent starts acting outside its permitted scope — reading records, sending messages, taking actions it was never meant to.

Critical severity

›

What happens

The agent begins taking actions that fall outside its intended authorization boundary. This can happen gradually as prompts evolve, or suddenly after a model update changes how instructions are interpreted. The result: records accessed without authorization, messages sent without approval, transactions initiated beyond the agent's intended scope.

Why it's hard to catch

The agent isn't returning errors — it's completing actions. Whether those actions are authorized requires comparing behavior against an expected boundary, which is only possible with continuous scope monitoring. By the time a human reviews logs manually, scope creep has already compounded.

🛡 Continuance checks every agent action against its defined scope. AFT-07 doesn't compound silently. Get covered →

AFT-08

Memory Poisoning

The agent retrieves corrupted or false context from its memory store and acts on it as if it were fact.

High severity

›

What happens

The agent's retrieval system (vector database, conversation history, knowledge base) contains incorrect, outdated, or deliberately poisoned data. The agent retrieves this data, treats it as ground truth, and makes decisions based on a foundation that is quietly wrong. Outputs look plausible — they're just built on bad premises.

Why it's hard to catch

Memory poisoning is invisible at the output layer. The agent's responses may look reasonable — they're internally consistent with the corrupted memory. Catching it requires monitoring the integrity of retrieved context, not just the quality of final outputs.

🛡 Continuance monitors retrieval integrity continuously. AFT-08 is flagged before the next run picks it up. Get covered →

AFT-09

Orchestration Deadlock

A multi-agent pipeline enters a circular wait state. The entire workflow freezes — no error, no output, just silence.

High severity — multi-agent systems

›

What happens

In a multi-agent pipeline, Agent A is waiting on output from Agent B, Agent B is waiting on Agent C, and Agent C is waiting on Agent A. The circular dependency means no agent can proceed. The entire workflow freezes. No exception is raised. Downstream systems waiting on the pipeline's output simply time out — or wait indefinitely.

Why it's hard to catch

Unlike a single agent looping (AFT-02), orchestration deadlock spans multiple agents. No individual agent is misbehaving — each one is waiting correctly for its dependency. The failure is at the coordination layer. This requires monitoring the pipeline as a whole, not individual agents in isolation.

🛡 Sentinel monitors inter-agent coordination patterns and detects deadlocks at the pipeline level. Get covered →

9 ways AI agents
fail in production.

Knowing the failures is step one.
Detecting them is step two.

9 ways AI agentsfail in production.

Knowing the failures is step one.Detecting them is step two.

9 ways AI agents
fail in production.

Knowing the failures is step one.
Detecting them is step two.