> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).
Does anyone have a sense of why CUDA is more important for training than inference?
NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.
Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.
If this turns out to be the future use-case for NNs(it is today), then Google are better set.
Won't the need to train increase as the need for specialized, smaller models increases and we need to train their many variations? Also what about models that continuously learn/(re)train? Seems to me the need for training will only go up in the future.
This is a very important point - the market for training chips might be a bubble, but the market for inference is much, much larger. At some point we might have good enough models and the need for new frontier models will cool down. The big power-hungry datacenters we are seeing are mostly geared towards training, while inference-only systems are much simpler and power efficient.
A real shame, BTW, all that silicon doesn't do FP32 (very well). After training ceases to be that needed, we could use all that number crunching for climate models and weather prediction.
it's already the case that people are eeking out most further gains through layering "reasoning" on top of what existing models can do - in other words, using massive amounts of inference to substitute for increases model performance. Whereever things plateau I expect this will still be the case - so inference ultimately will always be the end game market.
It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.
Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.
This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.
When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.
Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.
Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".
I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.
CUDA is just a better dev experience. Lots of training is experiments where developer/researcher productivity matters. Googlers get to use what they're given, others get to choose.
Once you settle on a design then doing ASICs to accelerate it might make sense. But I'm not sure the gap is so big, the article says some things that aren't really true of datacenter GPUs (Nvidia dc gpus haven't wasted hardware on graphics related stuff for years).
That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?
inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware
Let w be the vector of weights and S be the comformable matrix of covariances. The portfolio variance is given by w’Sw. So just minimize that with whatever constraints you want. If you just asssume weights sum to one, it is a classic quadratic optimization with linear equality constraints. Well known solutions.
The fix for this is for the AI to double-check all links before providing them to the user. I frequently ask ChatGPT to double check that references actually exist when it gives me them. It should be built in!
Gemini will lie to me when I ask it to cite things, either pull up relevant sources or just hallucinate them.
IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.
Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.
I have found my self doing the same "citation needed" loop - but with ai this is a dangerous game as it will now double down on whatever it made up and go looking for citations to justify its answer.
Pre prompting to cite sources is obviously a better way of going about things.
It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.
It's not bad when they use the Internet at generation time to verify the output.
I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.
I would distinguish between visual imagination and visuospatial reasoning.
For people like myself with aphantasia, there are often problems solving strategies that can help you when you can’t visualize. Like draw a picture.
And lots of problems don’t really require as much visual imagination as you would think. I’m pretty good at math, programming, and economics. Not top tier, but pretty good.
If there are problems out there that you struggle with compared to others, then that’s the universe telling you that you don’t have a comparative advantage in it. Do something else and hire the people who can more easily solve them if you need it.
It sounds like you have routed around your spatial visualization deficit, but that just proves the importance of alternate cognitive strategies rather than indicate that such an aptitude or deficit doesn’t ceteris paribus impact mathematical achievement.
I took some sort of IQ test when I was a kid and there was an entire section that was "if you rotate this object around that axis, it matches which of the followin g options". Try as I might, I can't picture this in my head (picturing anything other than a sphere or a cube is tough) but I found that I could look at the options and logically exclude them in a very tedious way by inspection.
It's one of the reasons I like computer graphics so much: the computer does the rotation for you! Stereo graphics (using the funny LCD glasses) was a true revelation to me, and learning how to rotate things using matrics was another.
You must hate those fancy new style captchas where you rotate the object. I’ve never considered the fairness and discriminatory aspect of captchas until now. I wonder if in the future eternal September will finally end as increasingly complex captchas act as a sort of poll test on posting.
Data science wasn't even a degree you could get 20 years ago. Twenty years ago if you were interested in what is now called data science, you were getting a degree with some kind of exposure to applied statistics. Economics is one of those disciplines (through econometrics).
No, I did stats as part of economics around then, and it's nothing like modern DS. It overlaps a fair bit, but in practice the classical stats student is bringing a knife to a gunfight.
The practice of working with huge datasets manipulated by computers is valuable enough that you need separate training in it.
I don't know what's in a modern stats degree though, I would assume they try to turn it into DS.
Data science is basically a marketing title given to what would have been a joint CS/statistics degree in the past. Maybe a double major, or maybe a major in one and an extensive minor in another. And it's mostly taught by people with a background in CS or statistics.
Like with most other academic fields, there is no clear separation between data science and neighboring fields. Its existence as a field tells more about the organization of undergraduate education in the average university than about the field itself.
The Finnish term for CS translates as "data processing science" or "information processing science". When I was undergrad ~25 years ago, people in the statistics department were arguing that it would have been a more appropriate name for statistics, but CS took it first. The data science perspective was already mainstream back then, as the people in statistics were concerned. But statistics education was still mostly about introductory classes of classical statistics offered to people in other fields.
No. Data science is different than statistics, because it is done on computers. It also uses machine learning algorithms instead of statistical algorithms. These advances, and the shedding of generations of restrictive cruft - frees data scientists to craft answers that their bosses want to hear - proving the superiority of data science over statistics.
yeah, we called that data mining, decision systems, and whatnot... mapreduce was as fresh and hot as the Paul Graham's essays book... folks were using Java over python, due to some open source library from around the globe...
essentially, provided you were at a right place in a right time, you could get a BSc in it
One of the difficulties with these models would be backtesting investment strategies. You always need to make sure that you are only using data that would have been available at the time to avoid look-ahead bias.
Linking blog articles that bury the lead behind paywall make it impossible to discuss anything.
However, at the core, US insurance system is the problem because it gets compounded by government trying to regulate such a system, so people do not die needlessly, but not destroy these profit seeking enterprises. So, what you end up with is a massive mess that leaves everybody cranky.
Does anyone have a sense of why CUDA is more important for training than inference?