I did a sort of bell loop with this type of workflow over summer.
- Base Claude Code (released)
- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)
- Base Claude Code (today)
Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.
My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.
I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.
I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator
I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?
Most of what I'm seeing is AI influencers promoting their shovels.
Even if somebody shows you what they've built with it, you're none the wiser. All you'll know is that it seemingly works well enough for a greenfield project.
The jury is still very far out on how agentic development affects mid/long term speed and quality. Those feedback cycles are measured in years, not weeks. If we bother to measure at all.
People in our field generally don't do what they know works, because by and large, nobody really knows, beyond personal experiences, and I guess a critical mass doesn't even really care. We do what we believe works. Programming is a pop culture.
It's for personal use, and I wouldn't call it great software, but I used Claude Code Teams in parallel to create a Fluxbox-compatible window compositor for Wayland [1].
Overall effort was a few days of agentic vibe-coding over a period of about 3 weeks. Would have been faster, but the parallel agents burn though tokens extremely quickly and hit Max plan limits in under an hour.
The long tail of deployable software always strikes at some point, and monetization is not the first thing I think of when I look at my personal backlog.
I also am a tmux+claude enjoyer, highly recommended.
In my view, these agent teams have really only become mainstream in the last ~3 weeks since Claude Code released them. Before that they were out there but were much more niche, like in Factory or Ralphie Wiggum.
There is a component to this that keeps a lot of the software being built with these tools underground: There are a lot of very vocal people who are quick with downvotes and criticisms about things that have been built with the AI tooling, which wouldn't have been applied to the same result (or even poorer result) if generated by human.
This is largely why I haven't released one of the tools I've built for internal use: an easy status dashboard for operations people.
Things I've done with agent teams: Added a first-class ZFS backend to ganeti, rebuilt our "icebreaker" app that we use internally (largely to add special effects and make it more fun), built a "filesystem swiss army knife" for Ansible, converted a Lambda function that does image manipulation and watermarking from Pillow to pyvips, also had it build versions of it in go, rust, and zig for comparison sake, build tooling for regenerating our cache of watermarked images using new branding, have it connect to a pair of MS SQL test servers and identify why logshipping was broken between them, build an Ansible playbook to deploy a new AWS account, make a web app that does a simple video poker app (demo to show the local users group, someone there was asking how to get started with AI), having it brainstorm and build 3 versions of a crossword-themed daily puzzle (just to see what it'd come up with, my wife and I are enjoying TiledWords and I wanted to see what AI would come up with).
Those are the most memorable things I've used the agent teams to build in the last 3 weeks. Many of those things are internal tools or just toys, as another reply said. Some of those are publicly released or in progress for release. Most of these are in addition to my normal work, rather than as a part of it.
Further, my POV is that coding agents crossed a chasm only last December with Opus 4.5 release. Only since then these kinds of agent teams setups actually work. It’s early days for agent orchestration
I work for Snowflake and the code I'm building is internal. I'm exploring open sourcing my main project which I built with this system. I'd love to share it one day!
There are dozens and dozens of these submitted to Show HN, though increasingly without the title prefix now. This one doesn't seem any more interesting than the others.
I picked up a number things from others sharing their setup. While I agree some aspects of these are repetitive (like using md files for planning), I do find useful things here and there.
I’ve been experimenting with a similar pattern but wrapping it in a “factory mode” abstraction (we’re building this at CAS[1]) where you define the spec once after careful planning using a supervisor agent then you let it go and spin up parallel workers against it automatically. It handles task decomposition + orchestration so you’re not manually juggling tmux panes
We ran something similar for a browser automation project - multiple agents working on different modules in parallel with shared markdown specs. The bottleneck wasn't the agents, it was keeping their context from drifting. Each tmux pane has its own session state, so you end up with agents that "know" different versions of reality by the second hour.
The spec file helps, but we found we also needed a short shared "ground truth" file the agents could read before taking any action - basically a live snapshot of what's actually done vs what the spec says. Without it, two agents would sometimes solve the same problem in incompatible ways.
Has anyone found a clean way to sync context across parallel sessions without just dumping everything into one massive file?
Few experiments like gas town, the compiler from Anthropic or the browser from Cursor managed to reach the Rocket stage, though in their reports the jagged intelligence of the LLMs was eerily apparent. Do you think we also need better models?
I did a sort of bell loop with this type of workflow over summer.
- Base Claude Code (released)
- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)
- Base Claude Code (today)
Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.
My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.
I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.
I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator
I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?
Most of what I'm seeing is AI influencers promoting their shovels.
Even if somebody shows you what they've built with it, you're none the wiser. All you'll know is that it seemingly works well enough for a greenfield project.
The jury is still very far out on how agentic development affects mid/long term speed and quality. Those feedback cycles are measured in years, not weeks. If we bother to measure at all.
People in our field generally don't do what they know works, because by and large, nobody really knows, beyond personal experiences, and I guess a critical mass doesn't even really care. We do what we believe works. Programming is a pop culture.
It's for personal use, and I wouldn't call it great software, but I used Claude Code Teams in parallel to create a Fluxbox-compatible window compositor for Wayland [1].
Overall effort was a few days of agentic vibe-coding over a period of about 3 weeks. Would have been faster, but the parallel agents burn though tokens extremely quickly and hit Max plan limits in under an hour.
1. https://github.com/ecliptik/fluxland
Pretty cool!
People are building software for themselves.
Correct. I've started recording what I've built (here https://jodavaho.io/posts/dev-what-have-i-wrought.html ), and it's 90% for myself.
The long tail of deployable software always strikes at some point, and monetization is not the first thing I think of when I look at my personal backlog.
I also am a tmux+claude enjoyer, highly recommended.
I’ve known too many developers and seen their half-assed definition of Done-Done.
I actually had a manager once who would say Done-Done-Done. He’s clearly seen some shit too.
In my view, these agent teams have really only become mainstream in the last ~3 weeks since Claude Code released them. Before that they were out there but were much more niche, like in Factory or Ralphie Wiggum.
There is a component to this that keeps a lot of the software being built with these tools underground: There are a lot of very vocal people who are quick with downvotes and criticisms about things that have been built with the AI tooling, which wouldn't have been applied to the same result (or even poorer result) if generated by human.
This is largely why I haven't released one of the tools I've built for internal use: an easy status dashboard for operations people.
Things I've done with agent teams: Added a first-class ZFS backend to ganeti, rebuilt our "icebreaker" app that we use internally (largely to add special effects and make it more fun), built a "filesystem swiss army knife" for Ansible, converted a Lambda function that does image manipulation and watermarking from Pillow to pyvips, also had it build versions of it in go, rust, and zig for comparison sake, build tooling for regenerating our cache of watermarked images using new branding, have it connect to a pair of MS SQL test servers and identify why logshipping was broken between them, build an Ansible playbook to deploy a new AWS account, make a web app that does a simple video poker app (demo to show the local users group, someone there was asking how to get started with AI), having it brainstorm and build 3 versions of a crossword-themed daily puzzle (just to see what it'd come up with, my wife and I are enjoying TiledWords and I wanted to see what AI would come up with).
Those are the most memorable things I've used the agent teams to build in the last 3 weeks. Many of those things are internal tools or just toys, as another reply said. Some of those are publicly released or in progress for release. Most of these are in addition to my normal work, rather than as a part of it.
Further, my POV is that coding agents crossed a chasm only last December with Opus 4.5 release. Only since then these kinds of agent teams setups actually work. It’s early days for agent orchestration
I work for Snowflake and the code I'm building is internal. I'm exploring open sourcing my main project which I built with this system. I'd love to share it one day!
The influencers generate noise, but the progress is still there. The real productivity gains will start showing up at market scale eventually.
look at Show HN. Half of it is vibe-coded now.
There are dozens and dozens of these submitted to Show HN, though increasingly without the title prefix now. This one doesn't seem any more interesting than the others.
I picked up a number things from others sharing their setup. While I agree some aspects of these are repetitive (like using md files for planning), I do find useful things here and there.
I built a Erlang based chat server implementing a JMAP extension that Claude wrote the RFC and then wrote the server for
Erlang FTW. I remember the days at the ol' lab!
i have no use for it at my work, i wish i did, so i did this project for run intead.
I wrote a Cash flow tracking finance app in Qt6 using claude and have been using it since Jan 1 to replace my old spreadsheets!
https://git.ceux.org/cashflow.git/
I’ve been experimenting with a similar pattern but wrapping it in a “factory mode” abstraction (we’re building this at CAS[1]) where you define the spec once after careful planning using a supervisor agent then you let it go and spin up parallel workers against it automatically. It handles task decomposition + orchestration so you’re not manually juggling tmux panes
[1] https://cas.dev
Do parallel workers execute on the same spec? How do you ensure they don't clash with each other?
supervisor handles this. if it sees that workers can collide it spawns them in worktrees while it handles the merging and cherry-picking
do you find the merging agent to be reliable? I had a few bad merges in the past that makes me nervous of just letting agents take care of it
We ran something similar for a browser automation project - multiple agents working on different modules in parallel with shared markdown specs. The bottleneck wasn't the agents, it was keeping their context from drifting. Each tmux pane has its own session state, so you end up with agents that "know" different versions of reality by the second hour.
The spec file helps, but we found we also needed a short shared "ground truth" file the agents could read before taking any action - basically a live snapshot of what's actually done vs what the spec says. Without it, two agents would sometimes solve the same problem in incompatible ways.
Has anyone found a clean way to sync context across parallel sessions without just dumping everything into one massive file?
I avoid this with one spec = one agent, with worktrees if there is a chance of code clashing. Not ideal for parallelism though.
Yeah the 8 agents limit aligns well with my conversations with folks in the leading labs
https://open.substack.com/pub/sluongng/p/stages-of-coding-ag...
I think we need much different toolings to go beyond 1 human - 10 agents ratio. And much much different tooling to achieve a higher ratio than that
Few experiments like gas town, the compiler from Anthropic or the browser from Cursor managed to reach the Rocket stage, though in their reports the jagged intelligence of the LLMs was eerily apparent. Do you think we also need better models?
These setups pretty much require the top tier subscription, right?
That's a yes from my side.
I liked the way how you bootstrap the agent from a single markdown file.
I built so much muscle memory from the original system, so it made sense to apply it to other projects. This was the simplest way to achieve that