MoE expected performance = sqrt(active heads \* total parameter count) sqrt(120\...

		BoorishBears 4 months ago \| parent \| context \| favorite \| on: GPT-OSS vs. Qwen3 and a detailed look how things e... MoE expected performance = sqrt(active heads * total parameter count) sqrt(120*5) ~= 24 GPT-OSS 120B is effectively a 24B parameter model with the speed of a much smaller model