So much this. People don't realize that when 1 trillion (10 trillion, 100 trillion, whatever comes next) is at stake, there are no limits what these people will do to get them.
I will be very surprised if there are not at least several groups or companies scraping these "smart" and snarky comments to find weird edge cases that they can train on, turn into demo and then sell as improvement. Hell, they would've done it if 10 billion was at stake, I can't really imagine (and I have vivid imagination, to my horror) what Californian psychopaths can do for 10 trillion.
I'm not worried about it because they won't waste their time on it (individually RL'ing on a dog with 5 legs). There are fractal ways of testing this inability, so the only way to fix it is to wholesale solve the problem.
Similar to the pelican bike SVG, the models that do good at that test do good at all SVG generation, so even if they are targeting that benchmark, they're still making the whole model better to score better.
I will be very surprised if there are not at least several groups or companies scraping these "smart" and snarky comments to find weird edge cases that they can train on, turn into demo and then sell as improvement. Hell, they would've done it if 10 billion was at stake, I can't really imagine (and I have vivid imagination, to my horror) what Californian psychopaths can do for 10 trillion.