It's very impressive. It feels like the text is a bit of a hack where they're somehow rendering the text separately and interpolating it into the image. Not always, I got it to render calligraphy with flourishes, but only for a handful of words.
For example, I asked it to render a few lines of text on a medieval scroll, and it basically looked like a picture of a gothic font written onto a background image of a scroll
You could have a model that receives the generated raw text and then is trained to display it in whatever style. Whether it looks like a font or not is irrelevant.
For example, I asked it to render a few lines of text on a medieval scroll, and it basically looked like a picture of a gothic font written onto a background image of a scroll