>AI is certainly not reliable enough for me to jeopardize the quality of my work...

dwoldrich · 2025-12-03T21:12:31 1764796351

It would be helpful if you would relate your own bad experiences and how you overcame them. Leading off with "do it better" isn't very instructive. Unfortunately there's no useful training for much of anything in our industry, much less AI.

I prefer to use LLM as a sock puppet to filter out implausible options in my problem space and to help me recall how to do boilerplate things. Like you, I think, I also tend to write multi-paragraph prompts repeating myself and calling back on every aspect to continuously hone in on the true subject I am interested in.

I don't trust LLM's enough to operate on my behalf agentically yet. And, LLM is uncreative and hallucinatory as heck whenever it strays into novel territory, which makes it a dangerous tool.

kentm · 2025-12-03T21:54:52 1764798892

> have you considered that you haven't used the tools correctly or effectively?

The problem is that this comes off just as tone-deaf as "you're holding it wrong." In my experience, when people promote AI, its sold as just having a regular conversation and then the AI does thing. And when that doesn't work, the promoter goes into system prompts, MCP, agent files, etc and entire workflows that are required to get it to do the correct thing. It ends up feeling like you're being lied to, even if there's some benefit out there.

There's also the fact that all programming workflows are not the same. I've found some areas where AI works well, but a lot of my work it does not. Usually things that wouldn't show up in a simple Google search back before it was enshittified are pretty spotty.

mattgreenrocks · 2025-12-03T23:00:49 1764802849

I suspect AI appeals very strongly to a certain personality type who revels in all the details in getting a proper agentic coding environment bootstrapped for AI to run amok in, and then supervises/guides the results.

Then there’s people like me, who you’d probably term as an old soul, who looks at all that and says, “I have to change my workflow, my environment, and babysit it? It is faster to simply just do the work.” My relationship with tech is I like using as little as possible, and what I use needs to be predictable and do something for me. AI doesn’t always work for me.

msikora · 2025-12-04T01:59:50 1764813590

Yes, this rings true, it took me over a month to actually get to at least 1x of my usual productivity with Claude Code. There is a ton of setup and ton of things to learn and try to see what works. What to watch out for and how to babysit it so it doesn't go off the rails (quite heavy handed approach works best for me). It's kind of like a shitty, but very fast and very knowledgable junior developer. At this moment it still maybe isn't "worth it" for a lot of devs if productivity (and developer ergonomics) is the only goal, but it is clear to me that this is where the industry is heading and I think every dev will eventually have to get on board. These tools really just started to be somewhat decent this year. I'm 100% sure that in a year or two it will be the default for everyone in a way that you simply won't be able to compete without it at all. It would be like using a shovel instead of an excavator. Remember, right now is the worst it'll ever be.

Lerc · 2025-12-03T23:01:13 1764802873

> In my experience, when people promote AI, its sold as just having a regular conversation and then the AI does thing.

This is almost the complete opposite of my experience. I hear expressions about improvements and optimism for the future, but almost all of the discussion from active people productivly using AI is about identifying the limits and seeing what benefits you can find within those limits.

They are not useless and they are also not a panacea. It feels like a lot of people consider those the only available options.

jandrewrogers · 2025-12-03T22:22:37 1764800557

AI is okay (not great) at generating low- to mid-skill code. If you are working in a high-skill software domain that requires pervasive state-of-the-art or first-principles implementation then AI produces consistently terrible code. It frequently is flatly incorrect about esoteric technical details that really matter.

It can't reason from first principles and there isn't training data for a lot of state-of-the-art computer science and code implementations. Nothing you can prompt will make it produce non-naive output because it doesn't have that capability.

AI works for a lot of things because, if we are honest, AI generated slop is replacing human generated slop. But not all software is slop and there are software domains where slop is not even an option.

suprjami · 2025-12-05T01:58:59 1764899939

All good, no snark inferred. Yes I have considered this, and I keep considering it every time I get a bad result. Sorry this response is so long.

I think I have a good idea how these things work. I have run local LLMs for a couple of years on a pair of video cards here, trying out many open weight models. I have watched the 3blue1brown ML course. I have done several LinkedIn Learning courses (which weren't that helpful, just mandatory). I understand about prompting precisely and personas (though I am not sold personas are a good idea). I understand LLMs do not "know" anything, they just generate the next most likely token. I understand LLMs are not a database with accurate retrieval. I understand "reasoning" is not actual thinking just manipulating tokens to steer a conversation in vector space. I understand LLMs are better for some tasks (summarisation, sentiment analysis, etc) than others (retrieval, math, etc). I understand they can only predict what's in their training data. I feel I have a pretty good understanding of how to get results from LLMs (or at least the ways people say you can get results).

I have had some small success with LLMs. They are reasonably good at generating sub-100 line test code when given a precise prompt, probably because that is in training data scraped from StackOverflow. I did a certification earlier this year and threw ~1000 lines of Markdown notes into Gemini and had it quiz me which was very useful revision, it only got one question wrong of the couple of hundred I had it ask me.

I'll give a specific example of a recent failure. My job is mostly troubleshooting and reading code, all of which is public open source (so accessible via LLM search tooling). I was trying to understand something where I didn't know the answer, and this was difficult code to me so I was really not confident at all in my understanding. I wrote up my thoughts with references, the normal person I ask was busy so I asked Gemini Pro. It confidently told me "yep you got it!".

I asked someone else who saw a (now obvious) flaw in my reasoning. At some point I'd switched from a hash algorithm which generates Thing A, to a hash algorithm which generates Thing B. The error was clearly visible, one of my references had "Thing B" in the commit message title, which was in my notes with the public URL, when my whole argument was about "Thing A".

This wasn't even a technical or code error, it was a text analysis and pattern matching error, which I didn't see because I was so focused on algorithms. Even Gemini, the apparent best LLM in the world which is causing "code red" at OpenAI did not pick this up, when text analysis is supposed to be one of its core functionalities.

I also have a lot of LLM-generated summarisation forced on me at work, and it's often so bad I now don't even read it. I've seen it generate text which makes no logical sense and/or which uses so many words without really saying anything at all.

I have tried LLM-based products where someone else is supposed to have done all the prompt crafting and added RAG embeddings and I can just behave like a naive user asking questions. Even when I ask these things question which I know are in the RAG, they cannot retrieve an accurate answer ~80% of the time. I have read papers which support the idea that most RAG falls apart after about ~40k words and our document set is much larger than that.

Generally I find LLMs are at the point where to evaluate the LLM response I need to either know the answer beforehand so it was pointless to ask, or I need to do all the work myself to verify the answer which doesn't improve my productivity at all.

About the only thing I find consistently useful about LLMs is writing my question down and not actually asking it, which is a form of Rubber Duck Debugging (https://en.wikipedia.org/wiki/Rubber_duck_debugging) which I have already practiced for many years because it's so helpful.

Meanwhile trillions of dollars of VC-backed marketing assures me that these things are a huge productivity increaser and will usher in 25% unemployment because they are so good at doing every task even very smart people can do. I just don't see it.

If you have any suggestions for me I will be very willing to look into them and try them.