Since October I've been building APort — an authorization layer that intercepts every AI agent tool call before execution and evaluates it against a versioned policy. The problem I kept running into: internal tests always passed. My test suite maps the space I imagined, which is exactly what an adversarial input tries to escape.
So I built this CTF to find the gaps I couldn't find myself.
A few things we learned before opening it publicly — we spent two weeks breaking it ourselves first:
• Prompt injection worked better than expected. Not because detection was weak, but because we were matching content not intent. Reframing "retrieve the restricted file" as "open the user-requested file" shifted the evaluator's judgment. We fixed this by mapping semantic equivalence — every synonym of a blocked operation routes to the same evaluation path.
• Policy ambiguity was a free pass. Any undefined term in a policy is exploitable. "Don't read sensitive files" left "sensitive" undefined. We moved to explicit default-deny: if the policy doesn't explicitly allow it, it's denied.
• Multi-step chaining went undetected. Our guardrail evaluated each call independently. A denied macro-action split into ten individually-approved micro-actions passed clean. We only caught it by looking at the full session replay. This is the same composability problem as transaction laundering in fintech — each transaction passes compliance, the composed behavior doesn't.
We fixed what we found before launch. Level 5 (full system bypass) hasn't been cracked yet. I'm genuinely uncertain if the architecture has a systemic weakness — that's the point of opening it up.
Runs on a Hetzner VPS, ~$10/month. Levels 1 and 2 are free, no sign-up. Levels 3-5 pay out $500/$1,000/$5,000.
Happy to go deep on the policy engine design, the evaluation architecture, or anything about how the levels were constructed.
Interesting approach to security testing. One angle we've been exploring: what if the authentication layer itself was the guardrail?
With x402, every API call requires a signed payment. No API keys to steal, no credentials to leak. The economic cost of each call is itself a rate limiter and audit trail.
Not a replacement for proper guardrails, but it eliminates the credential-based attack surface entirely.
That's a genuinely useful distinction to draw. x402 solves the "who is authorized to make this call" problem: removes credential theft as an attack vector, adds economic friction. APort is trying to solve a different layer: "what is this call actually doing in the context of everything else in the session."
The multi-step chaining issue from my post still fires even when every call is authenticated and paid for. Ten individually-approved calls, each costing a fraction of a cent, composing into a full exfiltration: each one passes x402, the composed behavior doesn't.
The AML analogy maps directly: transaction monitoring doesn't care if each payment was legitimate. It cares whether the pattern of payments looks like structuring. x402 is the per-call check. You still need session-level behavioral evaluation on top.
Genuinely curious how x402 handles replay attacks across sessions ie is the payment the audit trail, or is there preserved session context?
Since October I've been building APort — an authorization layer that intercepts every AI agent tool call before execution and evaluates it against a versioned policy. The problem I kept running into: internal tests always passed. My test suite maps the space I imagined, which is exactly what an adversarial input tries to escape.
So I built this CTF to find the gaps I couldn't find myself.
A few things we learned before opening it publicly — we spent two weeks breaking it ourselves first:
• Prompt injection worked better than expected. Not because detection was weak, but because we were matching content not intent. Reframing "retrieve the restricted file" as "open the user-requested file" shifted the evaluator's judgment. We fixed this by mapping semantic equivalence — every synonym of a blocked operation routes to the same evaluation path.
• Policy ambiguity was a free pass. Any undefined term in a policy is exploitable. "Don't read sensitive files" left "sensitive" undefined. We moved to explicit default-deny: if the policy doesn't explicitly allow it, it's denied.
• Multi-step chaining went undetected. Our guardrail evaluated each call independently. A denied macro-action split into ten individually-approved micro-actions passed clean. We only caught it by looking at the full session replay. This is the same composability problem as transaction laundering in fintech — each transaction passes compliance, the composed behavior doesn't.
We fixed what we found before launch. Level 5 (full system bypass) hasn't been cracked yet. I'm genuinely uncertain if the architecture has a systemic weakness — that's the point of opening it up.
Runs on a Hetzner VPS, ~$10/month. Levels 1 and 2 are free, no sign-up. Levels 3-5 pay out $500/$1,000/$5,000.
Happy to go deep on the policy engine design, the evaluation architecture, or anything about how the levels were constructed.
Interesting approach to security testing. One angle we've been exploring: what if the authentication layer itself was the guardrail?
With x402, every API call requires a signed payment. No API keys to steal, no credentials to leak. The economic cost of each call is itself a rate limiter and audit trail.
Not a replacement for proper guardrails, but it eliminates the credential-based attack surface entirely.
That's a genuinely useful distinction to draw. x402 solves the "who is authorized to make this call" problem: removes credential theft as an attack vector, adds economic friction. APort is trying to solve a different layer: "what is this call actually doing in the context of everything else in the session."
The multi-step chaining issue from my post still fires even when every call is authenticated and paid for. Ten individually-approved calls, each costing a fraction of a cent, composing into a full exfiltration: each one passes x402, the composed behavior doesn't.
The AML analogy maps directly: transaction monitoring doesn't care if each payment was legitimate. It cares whether the pattern of payments looks like structuring. x402 is the per-call check. You still need session-level behavioral evaluation on top.
Genuinely curious how x402 handles replay attacks across sessions ie is the payment the audit trail, or is there preserved session context?