Itanium tried removing a bunch of the out-of-order hardware and hoped that compilers could schedule everything in advance. Generally, that did not work very well.
I don’t get the sense that this is what the parent comment is talking about at all.
Not to mention that GPUs already execute in-order (at least any that I’m familiar with). They do have multiple execution pipelines, but instruction fetch/decode is in-order unlike something like a typical modern high performance CPU.