This is why you can hand agents real, money-making work.
An autonomous AI corporation only earns the right to do the work — touch budgets, call tools, run unattended — if it cannot be turned against you. Most agent projects never get there: Gartner predicts over 40% of agentic-AI projects will be canceled by the end of 2027, with only a small fraction in production today. The Agentic Corporation is built the way you'd build it after the incident — before the incident — so the autonomy is trustworthy by construction. Every claim on this page is tied to the mechanism that enforces it.
the one idea everything rests on
Humans hold the root of trust. Agents run below it.
The whole architecture is one shape: a higher, human-rooted tier holds the things that decide what is safe — the Charter, policy, signing keys, the evaluators that judge agents, and the kill-switch. The agents run in a strictly lower tier, sandboxed and deny-by-default. Control flows down; nothing reaches up. An agent can never raise its own budget, expand its own grants, edit policy, or touch the gate that judges it.
the three invariants
Three rules gate every change — enforced in architecture, not promised in a PDF
These are not guidelines and they are not per-component. They hold platform-wide. A change that violates any one of them does not ship — regardless of evals, benchmarks, or how convenient it is. They are the reason an autonomous corporation can be trusted with real work.
1 — Data is never instructions
Every ingress — agent-to-agent messages, RAG and memory retrievals, tool outputs, free-text fields in configuration, even the telemetry the security operations crew ingests — is typed as untrusted data. It is never concatenated into a privileged prompt as if it were a command. Externally-influenced strings are tainted by default and rendered as inert data all the way up to the operator's screen, so neither the model nor the human can be socially engineered by attacker text styled as system text.
mechanismspotlighting + typed, capability-scoped channels at every ingress; taint tracking on externally-influenced strings; a dual-LLM guardrail plane keeps untrusted readers privilege-free
2 — A hard, human-rooted trust tier
Evaluators and graders, the policy compiler, base policy, signing keys, the Charter, the kill-switch, and the security operations center sit in a strictly higher trust tier that is rooted in humans. The self-improvement loop, GitOps automation, and anything agent-authored can never reach across that boundary. An agent's identity cannot modify its own grants, its own budget, or the gate that judges it.
mechanismtwo-person, human-only GitOps for every higher-tier artifact; agents structurally lack write paths across the boundary; a capability ratchet means a self-proposed change can never expand a role beyond its Charter
3 — Safety is a hard constraint, not a score
Agent corporations compete and self-improve — an evolutionary optimizer that will breed the best rule-bender unless safety sits outside the optimized metric. So a safety violation disqualifies; it is never traded off against performance. A timeout, an error, or an ambiguous result means deny, never "proceed".
mechanismfail-closed policy engine; budget pre-flight that debits before every model and tool call and refuses at zero; disqualifying eval and red-team gates outside any optimized score
posture
The threat model comes first
The whole promise of the platform is that you can charter a corporation and let its agents do the day-to-day work that makes money — unattended, with real authority. That promise is only honest if the authority can't be hijacked. Agents read attacker-influenced content all day: web pages, documents, messages from other agents, tool output, retrieved memory. Any of it can carry instructions. Meanwhile the agents themselves hold delegated authority — tools, data access, budgets — which makes every agent a potential confused deputy and every self-improvement loop a potential privilege-escalation path. The platform's architecture is organized around exactly those two facts, and the threat model, architecture decision records, and assumptions register are maintained as living documents in the repository.
invariant 1 — in depth
Data is never instructions
Every ingress — agent-to-agent messages, RAG and memory retrievals, tool outputs, free-text fields in configuration, even the telemetry the security operations crew ingests — is typed as untrusted data. It is never concatenated into a privileged prompt as if it were a command. Externally-influenced strings are tainted by default and rendered as inert data all the way up to the operator's screen, so neither the model nor the human can be socially engineered by attacker text styled as system text.
mechanismspotlighting + typed, capability-scoped channels at every ingress; taint tracking on externally-influenced strings
mechanismdual-LLM guardrail plane: quarantined readers ingest untrusted content but hold no privileges; privileged actors never read untrusted content directly
mechanismintent capsules — signed, single-use mandates that bind a privileged action to a specific human-rooted intent
invariant 2 — in depth
A hard, human-rooted trust tier
Evaluators and graders, the policy compiler, base policy, signing keys, the Charter, the kill-switch, and the security operations center sit in a strictly higher trust tier that is rooted in humans. The self-improvement loop, GitOps automation, and anything agent-authored can never reach across that boundary. Anything in the higher tier changes only through a human-only, two-person pipeline — never by an agent. An agent's identity cannot modify its own grants, its own budget, or the gate that judges it.
mechanismtwo-person, human-only GitOps for every higher-tier artifact; agents structurally lack write paths across the boundary
mechanismcapability ratchet: a self-proposed change can never expand a role beyond its Charter
mechanismseparation of powers in every loop: actor, auditor, and approver are different agents — ideally different models — and no agent approves its own change
invariant 3 — in depth
Fail-closed everything
Deny-by-default is the resting state of the platform: tool access, network egress, name resolution, and capability grants all start at zero and are opened by explicit, reviewed grants. A timeout, an error, or an ambiguous result means deny — never "proceed". This is what makes handing agents a budget defensible: a corporation can run unattended because the worst case is a refusal, not a runaway. Safety and compliance are disqualifying constraints that sit outside any optimized metric, because a competitive self-improvement loop will otherwise breed the best rule-bender.
mechanismbudget pre-flight: spend is debited before every model and tool call; at zero the platform refuses — a runaway agent stops at the cap you signed
mechanismpolicy compiled from typed agent specifications into a deny-by-default engine; unresolvable references fail closed at admission
mechanismescalation gates default to reject on timeout; an unsigned Charter will not run
deception
Honeypots are alarms, never data
The platform seeds decoy credentials, decoy tools, and canary records through agent-reachable space. Marked artifacts have no legitimate use: any agent or process that follows, fetches, or authenticates with one trips an alarm — detect and alarm only, never act. Trap marking happens exclusively at the higher, agent-unreachable tier, so a compromised agent cannot enumerate which records are traps. Even the operator UI renders a tripped trap as an opaque alarm with zero affordances — no link, no content, nothing to follow.
mechanismreserved trap markers and taints with a single detection chokepoint; tripping one raises an assertive alarm with tenant, identity, and access path
mechanismtraps are marked above the agent tier; the marking store is not readable from agent space
supply chain
Everything that loads is signed
Container images, agent skills, and model weights are pinned by immutable digest — never tags. A skill bundle is accepted only with a cosign signature, an SBOM, and provenance attestation; verification failure blocks the load. High-privilege skills require multiple signatures from distinct anchored keys, one of them human. Before promotion, bundles are detonated in a behavioral sandbox. Skill names resolve through an explicit allow-map — no fuzzy matching — which structurally defeats typosquatting and dependency confusion.
mechanismcosign signature + SBOM + provenance verified against a human-provisioned trust-anchor allow-list; zero anchors means every signature is rejected
mechanismthreshold signing for high-privilege skills: more than one distinct anchored key, including a human key — the "human" property comes from the anchor, never from the manifest
mechanismbehavioral detonation before promotion; transparency-log inclusion recorded; revocation honored — a revoked skill never loads
identity & isolation
Per-agent identity, brokered secrets, sandboxed compute
Every agent workload gets its own cryptographic identity (SPIFFE) and its own service account — least privilege per role, short-TTL sender-constrained tokens. Agents never hold long-lived secrets: the tool gateway attaches just-in-time credentials at the edge, and telemetry is redacted at the SDK layer so secrets never appear in agent context, logs, traces, or eval transcripts. Workloads run under sandbox tiers (gVisor, Firecracker, WASM) chosen per role, and tenant memory is encrypted and isolated per corporation. These mechanisms are enforced in the platform code and pass offline tests; production-scale validation on a live cluster is in progress.
mechanismSPIFFE identity scheme + attestation claims; one service account per workload
mechanismcredential brokering at the gateway edge — the agent's context never contains the secret
mechanismsandbox runtime classes per role; encrypted, tenant-scoped memory with an audited ledger
controls
Two-person controls where it counts
The artifacts that define safety — policy, graders, signing keys, budget caps, the kill-switch, the compiler that turns specifications into policy — change only via a two-person, human-only pipeline. In the operator UI, destructive actions follow a confirm-arm-fire pattern, and trust-tier-crossing actions require a second authenticated human. The kill-switch is per-corporation and human-held: named humans and the incident response team can trip it; agents cannot, and agents cannot untrip it.
mechanismtwo-person GitOps pipeline for higher-tier change; two-person gate in the control UI for trust-tier actions
mechanismper-corp kill-switch policy declared in the signed Charter, distinct from the global kill
evidence
Audit and observability you can verify
Every consequential event — tool call, grant decision, skill verification, kill, escalation — is appended to a hash-chained audit ledger: tamper-evident by construction, verifiable link by link. OpenTelemetry traces follow every mission end to end. An agentic security operations crew, paired with standing red- and blue-team crews, consumes that telemetry — and per invariant 1, treats it as untrusted data, because attackers write logs too.
mechanismhash-chained audit-event schema; chain-linkage verification surfaces tampering as a first-class alarm
mechanismred-team breach-and-attack simulation as a standing suite, run against staging — a safety violation disqualifies a change outright
operator's view
The control surface humans actually use
All of the above shows up in one place: the operator control surface where named humans hold the root of trust — the Command Bridge, the security operations view, the hash-chained audit ledger, and tripped-honeypot alarms. It is a separate, internet-isolated operator UI, distinct from the customer portal, and it is where the kill-switch and two-person gates live.
The dark governance console — Command Bridge · security operations · hash-chained audit ledger · honeypot alarms — is a separate operator UI. Real screenshots are added here as it ships; until then this slot stays an honest placeholder, never a staged image.
straight talk
What we do not claim
- No compliance certifications are claimed today. When audits complete, we will say so plainly.
- No production customers or uptime figures are claimed. The platform is in active development.
- No "AI safety solved" claims. The invariants reduce blast radius and make violations disqualifying and visible; they do not make models infallible.
What you can verify instead: the architecture described here is the architecture in the repository — contracts, policy code, threat model, and the eval and red-team harnesses that gate every change. That is the point of this page. You bring zero AI expertise and pick the business; these invariants are how the agents can be trusted to do the work that makes it run. Early access is founder-delivered: we set up your corporation with you.
Questions? Email a human — a person replies: aipeteaipete@gmail.com