Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I suggest you make explicit the assumption that this website is specifically about English text. Otherwise the leaderboard is pretty meaningless, with extreme differences in performance across other scripts - and potentially even languages such as Vietnamese or Czech which use Latin but have lots of accents.


Hey! I'm the dev who made this:) I think that you are right, data will bias towards english because we have a dataset that people can use that is in english. But you can also upload non-english docs into the battle mode as well as the playground!


LMArena splits their leaderboard by language: maybe you should consider doing the same thing

I assume to do that you’d need another model to do language detection on the inputs and/or outputs; but a language detection model can be a lot cheaper than an OCR model or an LLM


That's unfortunate because I have a bunch of photos with handwritten German on the back that I need to transcribe, and seeing as that I can't read German I can't really do it by myself either.


I reckon performance on German will be similar to English, the only real difference is the umlauts and those are very consistent. Not sure how it will do on the ß.


qwen 3.5 vl instruct on openrouter is damn cheap - and works quite well with non english stuff.

i have it verify some stamps which are quite messy and sometimes obscured and honestly some i could not even read.


from my first tests it does fine with german, at least for the gastly "handwritten" font the restaurant menu I used for the test uses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: