The quality of the data matters a lot. Textbooks, scientific papers, etc… are substantially better for training smart and capable LLMs than random social chit-chat.
Google and Microsoft both have a lot of corporate info, including source repositories they can legally use.
Google has Google Books, Maps, and YouTube.
Microsoft has Azure, GitHub, LinkedIn, etc…
Facebook has… what? Instagram? Your crazy aunt screaming about her conspiracy of the week?
Google and Microsoft both have a lot of corporate info, including source repositories they can legally use.
Google has Google Books, Maps, and YouTube.
Microsoft has Azure, GitHub, LinkedIn, etc…
Facebook has… what? Instagram? Your crazy aunt screaming about her conspiracy of the week?