Threat Model Summary
Threats mitigated: escalation, runaway automation, credential compromise, supply chain abuse.
Threat Model Summary
Purpose of This Section
This document consolidates the threat model implicit throughout the architecture into an explicit summary. The goal is not to claim completeness — a threat model that pretends to be exhaustive is unreliable — but to state plainly what this system is designed to defend against, what it does not attempt to defend against, and which risks are knowingly accepted in exchange for other properties.
Threats Addressed
The architecture is deliberately designed to mitigate six classes of threat. Each has been discussed in detail in earlier sections; what follows is a consolidated account of the threat, the reasoning behind its inclusion, and the mitigations applied.
Silent authority escalation. The most persistent risk in any delegated-authority system is that the delegate gradually acquires power without explicit approval. This can occur through credential inheritance, implicit trust accumulation, or capability drift over time. The architecture addresses this through identity separation, tool and skill gating, human-in-the-loop approvals, capability mismatch detection, and alert-driven state transitions. Authority is never implicit, inherited, or accumulated invisibly. Every expansion of the assistant’s effective power must pass through an explicit gate.
Runaway automation. Uncontrolled execution loops that consume resources, expand scope, or persist indefinitely are a structural risk in any system with access to external APIs and the ability to chain actions. The architecture mitigates this through hard API spend caps, research loop detection, automatic pauses and stops, absence-of-supervision shutdown behavior, and physical dependency on powered hardware. When intent becomes unclear or supervision is unavailable, the system’s behavior collapses toward inaction rather than expanding toward autonomy.
Credential and identity compromise. In the event that assistant-owned credentials are leaked or abused, the architecture limits blast radius through strict identity separation, the use of assistant-owned accounts exclusively, the prohibition on human credential delegation, inactivity-based account deletion, and explicit revocation workflows. Compromise is designed to be survivable and localized — a breach of one account does not cascade to others, and no assistant credential provides access to the operator’s own infrastructure.
Supply chain and update abuse. Unauthorized or unreviewed changes to the assistant’s behavior through compromised updates or dependencies represent a particularly insidious threat because they arrive through trusted channels. The architecture addresses this through fixed and pinned update sources, a process where the assistant prepares and the human approves every update, diff-based justification for changes, monitoring of the update mechanism itself, and immediate halt on detection of unauthorized modification. Change is permitted but never silent.
Tool and skill poisoning. Third-party or internally authored skills may introduce hidden behavior or request excessive authority. The architecture mitigates this through the four-phase skill analysis pipeline described in an earlier section: pre-ingestion analysis, multi-perspective risk review, capability mismatch detection, and human approval at the final gate. A periodic tool census ensures that approved skills remain appropriate over time, and removal is treated as a routine lifecycle action rather than an emergency measure.
Orphaned or zombie automation. The assistant continuing to operate after loss of supervision, relevance, or intent is a failure mode that grows more dangerous the longer it persists. The architecture prevents this through fail-closed network design, control-channel dependency, automatic escalation from paused to stopped states, account auto-deletion via inactivity, and physical shutdown as the terminal control. Persistence requires continued intent — when intent disappears, so does the assistant’s ability to act.
Threats Explicitly Out of Scope
The following threats are acknowledged but intentionally not addressed.
Physical coercion or hardware seizure. If an adversary gains physical access to the host hardware, the system does not defend against forced disclosure, hardware tampering, or offline data extraction. This risk is accepted as a boundary of personal-scale security. Mitigating it would require hardware security measures — tamper-evident enclosures, full-disk encryption with hardware-backed key management, secure boot chains — that are orthogonal to the concerns of this architecture and are better addressed by dedicated guidance on physical security.
Compromise of external providers. The architecture does not defend against malicious behavior by API providers, infrastructure-level compromise of identity platforms, or systemic failures of messaging services. These risks are mitigated only indirectly: the small number of approved integrations limits exposure, identity separation limits the damage any single provider compromise can cause, and revocation workflows allow rapid disconnection. But a determined attack at the provider level is beyond what a personal deployment can defend against.
Malicious human operator. If the operator intentionally chooses to misuse the system, the architecture does not prevent misuse outright, does not enforce moral correctness, and does not override explicit human decisions. As discussed in the previous section, it focuses on making misuse visible, documented, and attributable. The operator who misuses the system does so in full view of the audit trail — but the system does not attempt to stop them, because doing so would require it to override the very authority it is designed to respect.
Global-scale adversaries. This system is not designed to resist nation-state attacks, sustained targeted intrusion campaigns, or advanced persistent threats. It is a personal architecture operating on personal infrastructure. The threat model is calibrated to the risks that a security-conscious individual faces in the course of normal operation, not to the risks posed by well-resourced adversaries with the capability and motivation to conduct prolonged, targeted operations.
Accepted Tradeoffs
Several tradeoffs run through the architecture and are accepted intentionally.
Availability versus control. The system prefers downtime over autonomy, silence over speculation, and manual recovery over automatic failover. High availability is sacrificed to reduce the risk that the assistant acts without adequate supervision. This tradeoff is acceptable for a personal assistant that is not a critical service and whose operator can tolerate periods of unavailability.
Convenience versus safety. The architecture prioritizes explicit approvals over seamless flow, friction over speed, and predictability over cleverness. Every approval gate, every documentation requirement, and every prohibition on autonomous action introduces friction that slows the assistant’s throughput. That friction is the mechanism by which the operator maintains meaningful oversight, and it is accepted as the cost of operating within verifiable constraints.
Capability versus confinement. The assistant is capable of more than it is allowed to do. This asymmetry is intentional. Unexercised capability is safer than unconstrained execution, and the gap between what the assistant can do and what it is permitted to do is the space in which the architecture’s safety properties live.
Trust versus verifiability. The system avoids relying on alignment assumptions, intent inference, or behavioral optimism. Instead, it relies on structural constraints, observable state, and reversible actions. Trust is not assumed — it is replaced with inspection wherever possible. This means the system is more cumbersome to operate than one that assumes good behavior, but it is also more robust against the cases where that assumption would prove wrong.
Failure Philosophy
This threat model does not attempt to eliminate all risk. It attempts to ensure that failures are visible, bounded, survivable, and incapable of compounding silently. Unknown threats are expected — no threat model anticipates everything. What matters is that the architecture’s failure modes are designed so that an unanticipated threat produces a detectable disruption rather than a silent compromise.
Summary
By explicitly stating its threat boundaries and accepted tradeoffs, the architecture ensures that its security claims are honest, its gaps are visible rather than hidden, and its risks are managed rather than denied. This is not a promise of safety. It is a transparent account of where the system’s defenses lie, where they end, and what the operator must understand about both.
This document consolidates the threat model. The next section provides guidance on replicating this architecture in other environments.