Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Gemini 3 Pro represents a generational leap from simple recognition to true visual and spatial reasoning."

Prompt: "wine glass full to the brim"

Image generated: 2/3 full wine glass.

True visual and spatial reasoning denied.





Gemini 3 Pro is not Nano Banana Pro, and the image generation/model that decodes the generated image tokens may not be as robust.

The thinking step of Nano Banana Pro can refine some lateral steps (i.e. the errors in the homework correction and where they are spatially in the image) but it isn't perfect and can encounter some of the typical pitfalls. It's a lot better than Nano Banana base, though.


As a consumer I typed this into "Gemini". The behind the scenes model selection just adds confusion.

If "AI" trust is the big barrier for widespread adoption to these products, Alphabet soup isn't the solution (pun intended).


Nano Banana generates images.

This article is about understanding images.

Your task is unrelated to the article.


It works fine for me. https://imgur.com/a/MKNufm1

I actually did this prompt and found that it worked with a single nudge on a followup prompt. My first shot got me a wine glass that was almost full but not quite. I told it I wanted it full to the top - another drop would overflow. The second shot was perfectly full.

The correction I expect to give to an intern, not a junior person.

your intern can generate and edit photorealistic renderings of wine glasses? Still not bad.

did it return the exact same glass and surrounding imagery, just with more wine?

do it the other way - give it images of wine glasses and ask it whether they are full to the brim. I suspect it's going to nail them all (mainly because Qwen-VL already does nail things like that).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: