Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o

Does anyone have details on exactly what this means or where/how this metric gets derived?



I am guessing these are prices on services like AWS Bedrock (their post is down right now).


a big chunk of that is probably the fact that you don't need to pay someone who is trying to make a profit by running inference off-premises.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: