Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model

ranguna · 2025-03-22T12:44:12 1742647452

Not sure what they mean by ultra large. Hopefully it doesn't mean it's bigger than 1T parameters, if so, the results look pretty bad because R1 beats this model on a lot of benchmark and R1 is less than 700B parameters.

brokensegue · 2025-03-23T00:10:23 1742688623

> The model activates 52 billion parameters through dynamic expert routing, with each specialist module handling specific reasoning domains like mathematical logic or contextual analysis. This architecture enables:

https://www.analyticsvidhya.com/blog/2025/03/hunyuan-t1/

adultSwim · 2025-03-22T15:04:31 1742655871

It's neat to see someone scale up an alternative architecture.