You know, I had a think about that the other day - I believe that the volume of bad information might remain stable, while the shape changes. There are some things that LLMs are actually better at than the random mix of human-created data, on average. Subjects that are inherently political or skewed because of a small subset of very vocal and biased outliers. The LM tends to smooth some of those bumps out, and in some places (not all) this flattens out the rougher edges.
I don't think it necessarily bears repeating the plethora of ways in which LMs get stuff wrong, esp. considering the context of this conversation. It's vast.
As things develop, I expect that LMs will become more like the current zeitgeist as the effects that have influenced news and other media make their way into the models. They'll get better at smoothing in some areas (mostly technical or dry domains that aren't juicy targets) and worse in others (I expect to see more biased training and more hardcore censorship/steering in future).
Although, recursive reinforcement (LMs training on LM output) might undo any of the smoothing we see. It's really hard to tell - these systems are complex and very highly interconnected with many other complex systems.
I don't think it necessarily bears repeating the plethora of ways in which LMs get stuff wrong, esp. considering the context of this conversation. It's vast.
As things develop, I expect that LMs will become more like the current zeitgeist as the effects that have influenced news and other media make their way into the models. They'll get better at smoothing in some areas (mostly technical or dry domains that aren't juicy targets) and worse in others (I expect to see more biased training and more hardcore censorship/steering in future).
Although, recursive reinforcement (LMs training on LM output) might undo any of the smoothing we see. It's really hard to tell - these systems are complex and very highly interconnected with many other complex systems.