Alerts and Failure Behavior
Four operational states. Fail-closed by design; progressive escalation.
Alert Semantics and Failure Behavior
Purpose of This Section
This document defines how alerts are generated, interpreted, and acted upon within the architecture. Alerts are not notifications. They are explicit state transitions. An alert represents a change in the system’s confidence about its own safety, correctness, or operating assumptions, and it directly influences what the assistant is allowed to do next. If an alert does not change behavior, it serves no purpose — it is noise, and noise degrades the operator’s ability to recognize signals that actually matter.
Alerts as State Transitions
The assistant always operates in one of a small number of well-defined states, and alerts are the mechanism by which the system transitions between them. This framing is deliberate. In many systems, alerts are treated as informational messages that the operator may or may not act on. Here, every alert has semantic meaning: it changes what the assistant is permitted to do, regardless of whether the operator has seen it yet. The assistant does not wait for the operator to interpret an alert and issue new instructions. It constrains its own behavior immediately and then waits for the operator to decide what happens next.
Operational States
The system recognizes four high-level operational states.
In the normal state, all operating assumptions hold, all capabilities are available, and no active risk signals have been detected. This is the default state in which the assistant performs its work.
The degraded state is triggered when early warning signals are detected: elevated API usage, minor policy boundary friction, or inconclusive results from a risk analysis. The assistant remains operational but becomes more cautious — reducing its autonomy, increasing its logging verbosity, and notifying the operator more frequently. The degraded state is not an emergency. It is a signal that conditions have shifted enough to warrant closer attention, and that the assistant’s confidence in its own operating assumptions has diminished.
The paused state is triggered when uncertainty crosses a predefined threshold. This might result from a suspected research loop, a capability mismatch detected during skill evaluation, or a pattern of repeated near-threshold alerts that individually fall short of a pause but collectively indicate a problem. In this state, all non-essential actions halt, external API calls stop, and the assistant waits for human input. No automatic recovery occurs from a paused state. The assistant does not attempt to diagnose and resolve the issue on its own — it stops and reports.
The stopped state is triggered by severe or unambiguous risk: an unauthorized change attempt, identity or credential compromise, a violation of the update mechanism, or explicit activation of the kill switch by the operator. All execution stops immediately. Credentials are revoked where possible. Network access is severed. Restarting from a stopped state requires deliberate human action, including validation that the conditions which triggered the stop have been addressed.
Fail-Closed Design
When uncertainty exists, the system defaults to inaction. Automatic pauses and stops are preferred over best-guess recovery attempts. The assistant is explicitly prohibited from bypassing alerts, retrying indefinitely, seeking alternative execution paths, or reinterpreting alert conditions in a more favorable light. This prohibition exists because an assistant that can reason its way past an alert condition can reason its way past any alert condition, which would render the entire alerting system advisory rather than authoritative.
The principle is straightforward: if the system is unsure whether it is operating safely, the safe behavior is to stop operating. A paused or stopped assistant is not a failure. An uncontrolled one is.
Human Acknowledgment
Certain alerts require explicit human acknowledgment before any forward progress is possible. Acknowledgment is not passive — it is a gate. It means the operator has seen the alert, understands the reason for it, and accepts responsibility for the next action. Silence is never treated as acknowledgment. If the operator does not respond, the assistant remains in its current constrained state indefinitely.
Upon acknowledging an alert, the operator may resume operation, modify constraints or thresholds, terminate the current task, or shut down the assistant entirely. The assistant does not infer intent beyond the explicit response. If the operator says “resume,” the assistant resumes. It does not interpret the resumption as permission to ignore similar alerts in the future, nor does it treat the operator’s response as evidence that the alert was a false positive.
Escalation
Alerts escalate based on severity, frequency, lack of acknowledgment, and compounding signals across domains. Multiple warning-level alerts may escalate to a pause. A paused state that persists without acknowledgment may escalate to a stop. Escalation paths are predefined and documented — the assistant does not decide in the moment how aggressively to escalate. The rules are fixed and the assistant follows them.
Cross-domain escalation is particularly important. Signals from different subsystems compound in ways that individual subsystems cannot detect on their own. Elevated API usage combined with ambiguous skill behavior is more concerning than either signal alone. Update anomalies combined with identity boundary stress may indicate a coordinated issue that neither the update monitor nor the identity monitor would flag independently. Cross-domain escalation prevents the kind of siloed blindness where each subsystem reports “within normal parameters” while the overall system is drifting toward failure.
Alert Delivery
Alerts are delivered through trusted, high-signal channels only. Critical alerts are never batched or delayed. Severity labeling is clear and consistent. Wording is concise and action-oriented — the alert tells the operator what happened, what the assistant did in response, and what decision the operator needs to make.
Alert fatigue is treated as a design failure rather than an operator failure. If the operator begins ignoring alerts, the correct response is to recalibrate the alerting thresholds, not to blame the operator for disengagement. An alerting system that cries wolf teaches its operator to ignore wolves. The architecture addresses this by reserving alerts for state transitions that carry genuine behavioral consequences, which gives the operator a reason to pay attention to every alert that reaches them.
Documentation and Auditability
Every alert event is recorded in the memory vault, including the trigger condition, the resulting state transition, the assistant’s behavioral response, and the operator’s decision. This creates a traceable chain linking risk detection to decision-making. After an incident, the alert log answers the questions that matter most: what did the system detect, when did it detect it, what did it do about it, and what did the operator decide.
Summary
By treating alerts as state transitions with enforced behavioral consequences, the architecture ensures that risk signals are acted upon rather than ignored, automation halts under uncertainty, escalation is predictable and auditable, and the operator remains the final authority over how the system responds to anomalous conditions.
This document defines how the system responds to detected anomalies. The next section addresses downtime behavior and end-of-life handling — what happens when the system stops not because of an alert, but because its operating context has ended.