Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would be interesting. I've been a bit sceptical of the entire strategy from the beginning. If oss was actually as good as o3 mini and in some cases o4 mini outside benchmarks, that would undermine openai's api offer for gpt 5 nano and maybe mini too.

Edit: found this analysis, it's on the HN frontpage right now

> this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.

https://x.com/jxmnop/status/1953899426075816164



The strategy of Phi isn't bad, it's just not general. It's really a model that's meant to be fine tuned, but unfortunately fine tuning tends to shit on RL'd behavior, so it ended up not being that useful. If someone made a Phi style model with an architecture that was designed to take knowledge adapters/experts (i.e. small MoE model designed to get separately trained networks plugged into them with routing updates via special LoRA) it'd actually be super useful.


The Phi strategy is bad. It results in very bad models that are useless in production, while gaming the benchmark to appear like it is actually able to do something. This is objectively bad.


I like the idea of having a _HIGHLY_ unopinionated base model that's just good at basic logic and instruction following that I can fine tune to my use case. Sadly, full fine tuning tends to make models derpy, and LoRAs are limited in terms of what they can achieve.


That seems unrelated? I think we are talking about past each other. Phi was trained on purely synthetic data derived from emulating the benchmark suite. Not surprisingly, this resulted in state of the art scores. And a model that was 100% useless at anything other than making the benchmark number go up.


Is there an URL to the post itself on somewhere else?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: