so apparently they have custom hardware that is basically absolutely gigantic chips - across the scale of a whole wafer at a time. Presumably they keep the entire model right on chip, in effectively L3 cache or whatever. So the memory bandwidth is absurdly fast, allowing very fast inference.
It's more expensive to get the same raw compute as a cluster of nvidia chips, but they don't have the same peak throughput.
As far as price as a coder, I am giving a month of the $50 plan a shot. I haven't figured out how to adapt my workflow yet to faster speeds (also learning and setting up opencode).
For $50/month, it's a non-starter. I hope they can find a way to use all this excess bandwidth to put out a $10 equivalent to Claude Code instead of a 1000 tok/s party trick I can't use properly.
GLM-4.6 is on par with Sonnet 4.5. Sometimes it is better, sometimes it is worse. Give it a shot. It's the only model that made me (almost) ditch Claude. The only problem is, Claude Code is still the best agentic program in town and search doesn't function without a proper subscription.
Cerebras offers pay-per-token. What are you asking for? Claude Code starts at $100, or $15/mtok. Cerebras is already much cheaper, but you want it to be even cheaper at $10?
It's more expensive to get the same raw compute as a cluster of nvidia chips, but they don't have the same peak throughput.
As far as price as a coder, I am giving a month of the $50 plan a shot. I haven't figured out how to adapt my workflow yet to faster speeds (also learning and setting up opencode).