Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wait, I think it's the other way around. Claude will just go circles with bad decisions forever, never stops. Codex have multiple times told me it is not able to do this task, and stops.


I think this closer to the crux of a major problem. Seemingly people have vastly different responses even for the same system/developer/user prompts, and I myself can feel a different in quality of the responses depending on when I use the hosted APIs, while hosted models always have consistent results.

For example, after 19:00 sometime (GMT+1), the response quality of both OpenAI and Anthropic (their hosted UIs) seems to drop off a cliff. If I try literally the same prompt the around 10:00 next morning, I get a lot better results.

I'm guessing there is so much personalization and other things going on, that two users will almost never have the same experience even with the same tools, models, endpoints and so on.


That's the nature of statistical output, even minus all the context manipulation going on in the background.

You say the outputs "seem" to drop off at a certain time of day, but how would you even know? It might just be a statistical coincidence, or someone else might look at your "bad" responses and judge them to be pretty good actually, or there might be zero statistical significance to anything and you're just seeing shapes in the clouds.

Or you could be absolutely right. Who knows?


Yeah, there is definitely a huge gulf in subjective experiences, and even within the same user experience. There are days when Claude makes so many mistakes I can't believe I ever found it useful. Strange.


I've certainly seen Claude Code get into bad loops and make terrible decisions too, but usually it's a poor architectural decision or completely forgetting important context; not "let's rewrite V8 from scratch" level of absurdity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: