Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can't agree with that. Gemini doesn't lead just on price/performance - ironically it's the best "normie" model most of the time, despite it's lack of popularity with them until very recent.

It's bad at agentic stuff, especially coding. Incomparably so compared to Claude and now GPT-5. But if it's just about asking it random stuff, and especially going on for very long in the same conversation - which non-tech users have a tendency to do - Gemini wins. It's still the best at long context, noticing things said long ago.

Earlier this week I was doing some debugging. For debugging especially I like to run sonnet/gpt5/2.5-pro in parallel with the same prompt/convo. Gemini was the only one that, 4 or so messages in, pointed out something very relevant in the middle of the logs in the very first message. GPT and Sonnet both failed to notice, leading them to give wrong sample code. I would've wasted more time if I hadn't used Gemini.

It's also still the best at a good number of low-resource languages. It doesn't glaze too much (Sonnet, ChatGPT) without being overly stubborn (raw GPT-5 API). It's by far the best at OCR and image recognition, which a lot of average users use quite a bit.

Google's ridiculously bad at marketing and AI UX, but they'll get there. They're already much more than just a "bang for the buck" player.

FWIW I use all 3 above mentioned on a daily basis for a wide variety of tasks, often side-by-side in parallel to compare performance.



My pet theory without any strong foundation is because OpenAI and Anthropic have trained their models really hard to fit the sycophantic mold of:

    ===============================
    Got it — *compliment on the info you've shared*, *informal summary of task*. *Another compliment*, but *downside of question*.
    ----------
    (relevant emoji) Bla bla bla
    1. Aspect 1
    2. Aspect 2
    ----------

    *Actual answer*

    -----------
    (checkmark emoji) *Reassuring you about its answer because:*

    * Summary point 1
    * Summary point 2
    * Summary point 3

    Would you like me to *verb* a ready-made *noun* that will *something that's helpful to you 40% of the time*?
    ===============================
It's gotta reduce the quality of the answers.


I suspect this has emerged organically from the user given RLHF via thumb voting in the apps. People LIKE being treated this way so the model converges in that direction.

Same as social media converging to rage bait. The user base LIKES it subconsciously. Nobody at the companies explicitly added that to content recommendation model training. I know, for the latter, as I was there.


Gemini does the sycophantic thing too, so I'm not sure that holds water. I keep having to remind it to stop with the praise whenever my previous instruction slips out of context window.


Oh god I _hate_ this. Does anyone have any custom instructions to shut this thing off. The only thing that worked for me is to ask the model to be terse. But that causes the main answer part to be terse too, which sucks sometimes.


Chatgpt has a setting where you can set the tone to robotic


Anthropic also injects these long conversation reminders that are paragraph upon paragraphs about safety and what not to do.

People have said it destroys the intelligence mid convo


Yes, but that’s their brand.


Not the case with GPT-5 I’d say. Sonnet 4 feels a lot like this, but the coding and agency of it is still quite solid and overall IMO the best coder. Gemini2.5 to me is most helpful as a research assistant. It’s quite good together with google search based grounding.


Gemini does this too, but also adds a youtube link to every answer.

Just on the video link alone Gemini is making money on the free tier by pointing the hapless user at an ad while the other LLMs make zilch off the free tier.


I've experienced the opposite. Gemini is actually the MOST sycophantic model.

Additionally, despite having "grounding with google search" it tends to default to old knowledge. I usually have to inform it that it's presently 2025. Even after searching and confirming, it'll respond with something along the lines of "in this hypothetical timeline" as if I just gaslit it.

Consider this conversation I just had with all Claude, Gemini, GPT-5.

<ask them to consider DDR6 vs M3 Ultra memory bandwidth>

-- follow up --

User: "Would this enable CPU inference or not? I'm trying to understand if something like a high-end Intel chip or a Ryzen with built in GPU units could theoretically leverage this memory bandwidth to perform CPU inference. Think carefully about how this might operate in reality."

<Intro for all 3 models below - no custom instructions>

GPT-5: "Short answer: more memory bandwidth absolutely helps CPU inference, but it does not magically make a central processing unit (CPU) “good at” large-model inference on its own."

Claude: "This is a fascinating question that gets to the heart of memory bandwidth limitations in AI inference. "

Gemini 2.5 Pro: "Of course. This is a fantastic and highly relevant question that gets to the heart of future PC architecture."


Not really. Any prefix before the content you want is basically "thinking time". The text itself doesn't even have to reflect it, it happens internally. Even if you don't go for the thinking model explicitly, that task summary and other details can actually improve the quality, not reduce it.


I recently started using Open WebUI, which lets you run your query on multiple models simultaneously. My anecdote: For non-coding tasks, Gemini 2.5 Pro beats Sonnet 4 handily. It's a lot more common to get wrong/hallucinated content from Sonnet 4 than Gemini.


Agreed. People talk up Claude but every time I try it I wind up coming back to Gemini fairly quickly. And it's good enough at coding to be acceptably close to Claude as well IMO.


Google also has a lot of very useful structured data from search that they’re surely going to figure out how to use at some point. Gemini is useless at finding hotels, but it says it’s using Google’s Hotel data, and I’m sure at some point it’ll get good at using it. Same with flights too. If a lot of LLM usage is going to be better search, then all the structured data Google have for search should surely be a useful advantage.


Does it still try to 'unplug' itself if it gets something wrong, or did they RL that out yet?


Not sure if you're joking or serious? Every model has "degenerate" behavior it can be coerced into. Sonnet is even more apologetic on average.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: