Not sure what they mean by ultra large. Hopefully it doesn't mean it's bigger than 1T parameters, if so, the results look pretty bad because R1 beats this model on a lot of benchmark and R1 is less than 700B parameters.
> The model activates 52 billion parameters through dynamic expert routing, with each specialist module handling specific reasoning domains like mathematical logic or contextual analysis. This architecture enables: