AlphaGo learned to play Go by playing with itself. Why couldn't LLM do the same? They got plenty of information to be used as a starting point, so surely they can figure out some novel information eventually.
LLMs aren't logically reasoning through an axiomatic system. Any patterns of logic they demonatrate are just recreated from patterns in input data. Effectively, they can't think new thoughts.
> Effectively, they (LLMs) can't think new thoughts.
This is true only if you assume that combining existing thought patterns is not new thinking. If they can't learn a certain pattern from training data, indeed they would be stuck. However, their training data keeps growing and updating, allowing each updated version to learn more patterns.