Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Chinese models typically focus on text

Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.

Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: