Yes, it's technically possible. But what you are suggesting is basically a dynam...

Yes, it's technically possible. But what you are suggesting is basically a dynamic filter. The problem is that codes are designed for end delivery and have very specific practical constraints.

For example, we could GREATLY improve compression ratios if we could reference key frames anywhere in the file. But devices only have so much memory bandwidth and users need to be able to seek while streaming on a 4g connection on a commuter train. I would really like to see memes make use of SVG filters and the like, but basically everyone flattens them into a bitmap and does OCR to extract metadata.

It's also really depressing how little effort is put into encoding, even by the hyper-scalers. Resolution (SD, HD, 4k and 8k) is basically the ONLY knob used for bitrate and quality management. I would much prefer to have 10 bit color over an 8K stream yet every talking head documentary with colored gradient backgrounds has banding.

Finally, there is the horror that are decoders. There a reference files that use formal verification to excise every part of a codec's spec. But Hollywood studios have dedicated movie theaters with all of the major projectors and they pay people to prescreen movies just to try and catch encoding/decoding glitches. And even that fails sometimes.

So sure, anything is possible. Flash was very popular in the 56k days because it rendered everything on the end device. But that entails other tradeoffs like inconsistent rendering and variable performance requirements. Codecs today do something very similar: describe bitmap data using increasingly sophisticated mathematical representations. But they are more consistent and simplify the entire stack by (for example) eliminating a VM. Just run PDF torture tests through your printer if you want an idea of how little end devices care about rendering intent.