I'm a bit late to the conversation but I'm on month 4 (?) of building a (greenfi...

I'm a bit late to the conversation but I'm on month 4 (?) of building a (greenfield) desktop app with Claude Code + Codex. I've been coding since Pulp Fiction hit theaters, and I'm confident I could have just written this thing from scratch without LLMs with a lot fewer headaches, but I really wanted to get my hands dirty with new tools and see what they are and aren't capable of.

Some brief takeaways:

1. I'm on probably the 10th complete-restart iteration; I had a strong vision for what it was going to be, with a very weak grasp on how to technically achieve it, as well as a tenuous-at-best grasp on some of what turned out to be the most difficult parts (clever memory management, optimizations for speed, wrangling huge datasets, algorithms, etc) -- I started with a CLI-only prototype thinking I could get it all together reasonably quickly and then move onto a hand-crafted visual UI that I'd go over with a fine-toothed comb.

I'm still working on the fundamentals LOL with a janky UI that I'll get to when the foundation is solid.

2. By iteration 4 or 5, I realized I wanted to implement stuff that was incompatible with the less-complicated foundations already laid; this becomes a big issue when you vibe code and have it write docs, and then change your mind / discover a better way to do it. The amount of sprawl and "overgrowth" in the codebase becomes a second job when you need to pivot -- you become a glorified hedge trimmer trying to excise both code AND documentation that will very confidently poison the agents moving forward if you don't.

3. Speaking of overconfidence, I keep finding myself in situations where the LLMs (due to not being able to contextualize the entire codebase at any single time) offer solutions/approaches/algorithms that work (and work well!) until you push more data at it. For validation purposes, I started with very limited datasets, so I could hand-check results and audit the database. By the time you're at a million rows, spot-checking becomes really hard, shit starts crashing because you didn't foresee architectural problems due to lack of domain experience, etc. You start asking for alternative solutions and approaches, you get them, but the LLM (not incorrectly) also wants to preserve what's already there, so a whole new logic path gets cut, and the codebase grows like a jungle. The docs get stale without getting pruned. There's conflicting context. Switch to a different LLM and sometimes naming conventions mysteriously shift like it's speaking a different dialect. On and on.

Are the tools worth it? Depends. For me, for this one, on the whole, yes; it has taken an extremely long time (in comparison to the promises of 10x productivity) to get to where I've been able to try out a dozen approaches that I was unfamiliar with, see first-hand what works and what doesn't, and get a real working grasp of how off-the-rails agentic coding can take you if you're just exploring.

I am now left with some really good, relevant code to reference, a BUNCH of really misguided code to flush down the shitter, a strong mental map of how to achieve what I'm building + where things are supposed to go, and now I'm starting yet another fresh iteration where I can scaffold and piece together the whole thing with refactored / reformatted / readable code. And then actually implement the UI I've been designing lol.

I get the whole "just bully the LLM until it seems like it works, then ship it" mentality; objectively that's not much different than "just bully the developer until it seems like it works, then ship it" mentality of a product manager. But as amazing as these tools are for conjuring something into existence from thin air, I really think the devil is truly in the details, and if you're making something you hope to ever be able to build upon and expand and maintain, you have to go far beyond "vibes" alone.