Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right, it can do modern writing but anything older than a century ( church records and census)and it produces garbage. Yandex Archives figured that out and have CER in a single digit but they have the resources to collect immense data for training. I'm slowly building a dataset for finetuning TROCR model and the best it can do is CER 18% ... which is sort of readable.


How do you do, fellow TrOCR fine-tuner?

I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.


I think I based my notebook on the official example but yes at some point new versions of the libraries completely broke it. I had to pin the versions for it to work again.

This one works, you can check the versions https://pastebin.com/QPjGHN8j




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: