> this report contains no further details about the architecture (including model size), hardware, training compute
As a beginner in the NLP world, this may serve me a purpose which is to hide the complexity behind building such models.. numbers like xyzB parameters, 12K A100s.. are scary, so I still can dream of building one system one day. This story [0] and this one [1] hide some extremely complex edge cases that a beginner will never though of or had the courage to start if he knew what is the real cost.
We may, however, still be able to infer some details [probably in the future] knowing how Microsoft had re-arranged its infrastructure to welcome OpenAI training [2]
As a beginner in the NLP world, this may serve me a purpose which is to hide the complexity behind building such models.. numbers like xyzB parameters, 12K A100s.. are scary, so I still can dream of building one system one day. This story [0] and this one [1] hide some extremely complex edge cases that a beginner will never though of or had the courage to start if he knew what is the real cost.
We may, however, still be able to infer some details [probably in the future] knowing how Microsoft had re-arranged its infrastructure to welcome OpenAI training [2]
_________________
[0]. https://www.construct.net/en/blogs/ashleys-blog-2/simple-sof...
[1]. https://prog21.dadgum.com/29.html
[2]. https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...