Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MoE expected performance = sqrt(active heads * total parameter count)

sqrt(120*5) ~= 24

GPT-OSS 120B is effectively a 24B parameter model with the speed of a much smaller model



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: