I definitely agree on the importance of personalized benchmarks for really feeli...

I definitely agree on the importance of personalized benchmarks for really feeling when, where and how much progress is occurring. The standard benchmarks are important, but it’s hard to really feel what a 5% improvement in X exam means beyond hype. I have a few projects across domains that I’ve been working on since ChatGPT 3 launched and I quickly give them a try on each new model release. Despite popular opinion, I could really tell a huge difference between GPT 4 and 5 , but nothing compared to the current delta between 5.1 and Gemini 3 Pro…

TLDR; I don’t think personal benchmarks should replace the official ones of course, but I think the former are invaluable for building your intuition about the rate of AI progress beyond hype.