Yet another blogpost that looks super impressive, until you get to the bottom and see the charts assessing held out task performance on ASKA and MineDojo and see that it's still a paltry 15% success rate. (Holy misleading chart batman!) Yes, it's a major improvement over SIMA 1, but we are still a long way from this being useful for most people.
To be fair, it's 65% on all tasks (with a 75% human baseline) and 15% on unseen environments. They don't provide a human baseline for that, but I'd imagine it's much more than 15%.
I personally am extremely impressed about it reaching 15% on unseen environments. Note that just this year, we were surprised that LLMs became capable of making any progress whatsoever in GBA Pokemon games (that have significantly simpler worlds and control schemes).
As for "true intelligence" - I honestly don't think that there is such a thing. We humans have brains that are wired based on our ancestors evolving for billions of years "in every possible environment", and then with that in place, each individual human still needs quite a few years of statistical learning (and guided learning) to be able to function independently.
Obviously I'm not claiming that SIMA 2 is as intelligent as a human, or even that it's on the way there, but based on recent progress, I would be very surprised if we don't see humanoid robots using a approaches inspired by this navigate our streets in a decade or so.
I don't think that's true. Humans are dramatically better than current AI systems at tackling novel problems and situations. Humans are capable of zero-shot learning by imagining how they might do something, we are able to apply general reasoning principles without previous examples.
https://arcprize.org/ is a whole category of problems that AI struggles with but humans are able to do.