Author here. I'm a software engineer with zero cybersecurity experience. I entered a beginner CTF at MWC Barcelona mostly to stress-test Pi (a coding agent) on something I knew nothing about.
The most interesting part for me was reviewing the full conversation logs afterward to figure out whether my steering actually helped or hurt. Turns out about 4 of my 24 interventions were counterproductive and the agent solved the last two phases completely on its own.
The repo has the full writeup, all the exploit scripts, and a table rating every single human message I sent: https://github.com/kafkasl/ctf
Happy to answer questions about the process, the agent, or the competition.
Author here. I'm a software engineer with zero cybersecurity experience. I entered a beginner CTF at MWC Barcelona mostly to stress-test Pi (a coding agent) on something I knew nothing about.
The most interesting part for me was reviewing the full conversation logs afterward to figure out whether my steering actually helped or hurt. Turns out about 4 of my 24 interventions were counterproductive and the agent solved the last two phases completely on its own.
The repo has the full writeup, all the exploit scripts, and a table rating every single human message I sent: https://github.com/kafkasl/ctf
Happy to answer questions about the process, the agent, or the competition.
For those that don't know, Pi is the minimal agent harness powering Open Claw too
https://github.com/badlogic/pi-mono
I feel bad for the participants who actually tried and lost to someone who has nothing good to say about them or their hobby.
sorry I came across like this. It's not my thing but I admire and respect the profession. Doing the analysis was fun and got me actually interested