Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> this report contains no further details about the architecture (including model size), hardware, training compute

As a beginner in the NLP world, this may serve me a purpose which is to hide the complexity behind building such models.. numbers like xyzB parameters, 12K A100s.. are scary, so I still can dream of building one system one day. This story [0] and this one [1] hide some extremely complex edge cases that a beginner will never though of or had the courage to start if he knew what is the real cost.

We may, however, still be able to infer some details [probably in the future] knowing how Microsoft had re-arranged its infrastructure to welcome OpenAI training [2]

_________________

[0]. https://www.construct.net/en/blogs/ashleys-blog-2/simple-sof...

[1]. https://prog21.dadgum.com/29.html

[2]. https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: