I'd be very surprised if a modern cpu couldn't handle the task, especially if you were clever about detecting regions of interest, predicting head movement and cache maintenance. But I'd also be surprised if they go to market with an x86 under the hood.
I remember reading a while ago about how smart tvs were using ANNs for upscaling, so it has been done at scale. rimshot
(1) TVs don't have strict latency requirement. I've hard latencies of 100 ms are common.
(2) Upscaling ANNs process rather small image neighborhood radius, and required processing power is on the order of O(r² * log r), and if a minimally recognizable cat is 50x50 px and for upscale you use a very large window of 16x16, that's 14 times already.
Latencies of 100 ms may be common because TVs don't have strict latency requirements.
16x16 is a very small window, I have no idea what they're using for TVs, but 128 isn't uncommon in post production ANN upscaling. Also consider the fact that ANNs have not received anywhere close to the level of attention in optimization that compilers have, so there is also a lot of potential slack to be taken up if real time processing demands it.
I remember reading a while ago about how smart tvs were using ANNs for upscaling, so it has been done at scale. rimshot