GPT-3 was 173B parameters and it's very bad compare to much smaller models we have nowadays, the data and the compute play a giant role, also I doubt you would need to train a model further after you have trained it on absolute everything (but we are very far from that).