Right, it can do modern writing but anything older than a century ( church records and census)and it produces garbage. Yandex Archives figured that out and have CER in a single digit but they have the resources to collect immense data for training.
I'm slowly building a dataset for finetuning TROCR model and the best it can do is CER 18% ... which is sort of readable.
I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.
I think I based my notebook on the official example but yes at some point new versions of the libraries completely broke it. I had to pin the versions for it to work again.