Absolutely this. Indeed I think it's fair to say that it'd take a *lot* of artif...

Absolutely this.

Indeed I think it's fair to say that it'd take a lot of artificial calibration and data curation for a model trained on a range of media including statements about and by Mitch McConnell and Trump respectively not to conclude that the latter was the one much more associated with "hate" and "danger" and "violence" and whatever other parameters a LLM ends up associating with inappropriateness.

A biased liberal human moderator, on the other hand, is going to see the real world political relationships rather than the raw text and see Mitch as a very problematic figure in very much the same bracket as Trump. They're certainly not going to rate him as a less problematic figure than Hillary Clinton or Nancy Pelosi!

Same when I'm getting identically structured caveats about considering good points in the context of the bad things he did for Bill Clinton and Stalin because all the machine knows is that equivocating is favoured and both have lots of "bad things" written about them (it disallowed considering the good points of Hitler, presumably because even an LLM can deduce Godwin's law!). I'm not sure this is quite how a human moderator, irrespective of bias, would handle it