Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model (tencent.com)
19 points by bananaflag 9 months ago | hide | past | favorite | 3 comments


Not sure what they mean by ultra large. Hopefully it doesn't mean it's bigger than 1T parameters, if so, the results look pretty bad because R1 beats this model on a lot of benchmark and R1 is less than 700B parameters.


> The model activates 52 billion parameters through dynamic expert routing, with each specialist module handling specific reasoning domains like mathematical logic or contextual analysis. This architecture enables:

https://www.analyticsvidhya.com/blog/2025/03/hunyuan-t1/


It's neat to see someone scale up an alternative architecture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: