More

1980phipsi · 2025-11-27T15:52:53 1764258773

> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).

Does anyone have a sense of why CUDA is more important for training than inference?

augment_me · 2025-11-27T20:13:44 1764274424

NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.

Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.

If this turns out to be the future use-case for NNs(it is today), then Google are better set.

grandmczeb · 2025-11-27T20:56:17 1764276977

All of those are things you can do with TPUs

eikenberry · 2025-11-27T23:35:04 1764286504

Won't the need to train increase as the need for specialized, smaller models increases and we need to train their many variations? Also what about models that continuously learn/(re)train? Seems to me the need for training will only go up in the future.

01100011 · 2025-11-28T07:59:56 1764316796

That's the thing - nobody knows. LLM architecture is constantly evolving and people are trying all kinds of things.

rbanffy · 2025-11-27T19:58:28 1764273508

This is a very important point - the market for training chips might be a bubble, but the market for inference is much, much larger. At some point we might have good enough models and the need for new frontier models will cool down. The big power-hungry datacenters we are seeing are mostly geared towards training, while inference-only systems are much simpler and power efficient.

A real shame, BTW, all that silicon doesn't do FP32 (very well). After training ceases to be that needed, we could use all that number crunching for climate models and weather prediction.

zmmmmm · 2025-11-28T02:11:41 1764295901

it's already the case that people are eeking out most further gains through layering "reasoning" on top of what existing models can do - in other words, using massive amounts of inference to substitute for increases model performance. Whereever things plateau I expect this will still be the case - so inference ultimately will always be the end game market.

markhahn · 2025-11-28T06:37:40 1764311860

Some more traditional number crunching has long looked at lower- and mixed-precision hardware.

llm_nerd · 2025-11-27T16:02:04 1764259324

It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.

Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.

qeternity · 2025-11-27T20:12:21 1764274341

This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.

imtringued · 2025-11-27T16:28:33 1764260913

When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.

Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.

Traster · 2025-11-27T19:19:37 1764271177

Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".

jeffbee · 2025-11-27T22:08:26 1764281306

I can't think of any case where inference doesn't care about latency.

cyanydeez · 2025-11-28T00:43:31 1764290611

I cant thinl of any reason training isnt going to become real time with a significant cpu budget.

johnebgd · 2025-11-27T15:57:24 1764259044

I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.

qcnguy · 2025-11-27T21:24:53 1764278693

CUDA is just a better dev experience. Lots of training is experiments where developer/researcher productivity matters. Googlers get to use what they're given, others get to choose.

Once you settle on a design then doing ASICs to accelerate it might make sense. But I'm not sure the gap is so big, the article says some things that aren't really true of datacenter GPUs (Nvidia dc gpus haven't wasted hardware on graphics related stuff for years).

baby_souffle · 2025-11-27T15:59:15 1764259155

That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?

NaomiLehman · 2025-11-27T15:58:42 1764259122

inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware

never_inline · 2025-11-27T16:09:31 1764259771

> static, bounded problem

What does it even mean in neural net context?

> numerical stability

also nice to expand a bit.

1980phipsi · 2025-11-27T15:35:47 1764257747

Aka vertical integration.

1980phipsi · 2025-11-15T18:08:31 1763230111

It’s much clearer when you write these problems in terms of matrix math. The minimum variance portfolio is very important in finance.

thomasahle · 2025-11-16T00:12:20 1763251940

How would you write this with matrices? It seems like there are many ways you could generalize.

1980phipsi · 2025-11-27T17:16:09 1764263769

Let w be the vector of weights and S be the comformable matrix of covariances. The portfolio variance is given by w’Sw. So just minimize that with whatever constraints you want. If you just asssume weights sum to one, it is a classic quadratic optimization with linear equality constraints. Well known solutions.

1980phipsi · 2025-10-23T17:43:42 1761241422

LDC isn't regularly behind DMD lately. The issue lately has been more the release process with respect to DMD. People issues impacting that.

pjmlp · 2025-10-24T05:26:39 1761283599

Which was my point, volunteer work without enough people.

1980phipsi · 2025-10-22T15:29:39 1761146979

The fix for this is for the AI to double-check all links before providing them to the user. I frequently ask ChatGPT to double check that references actually exist when it gives me them. It should be built in!

rideontime · 2025-10-22T15:37:56 1761147476

But that would mean OpenAI would lose even more money on every query.

mdhb · 2025-10-22T17:34:44 1761154484

Almost as though it’s not a sustainable business model and relies of tricking people in order to keep the lights on.

ModernMech · 2025-10-22T20:22:52 1761164572

Better make each query count then.

dingnuts · 2025-10-22T18:16:14 1761156974

Gemini will lie to me when I ask it to cite things, either pull up relevant sources or just hallucinate them.

IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.

Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.

blitzar · 2025-10-22T15:40:46 1761147646

I have found my self doing the same "citation needed" loop - but with ai this is a dangerous game as it will now double down on whatever it made up and go looking for citations to justify its answer.

Pre prompting to cite sources is obviously a better way of going about things.

janwl · 2025-10-22T15:47:15 1761148035

I thought people here hated it when LLMs made http requests?

zahlman · 2025-10-22T16:30:18 1761150618

It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.

It's not bad when they use the Internet at generation time to verify the output.

Dylan16807 · 2025-10-23T01:22:52 1761182572

Also for the most part this verification can use a HEAD request.

macintux · 2025-10-22T16:04:38 1761149078

I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.

1980phipsi · 2025-10-13T15:48:31 1760370511

I would distinguish between visual imagination and visuospatial reasoning.

For people like myself with aphantasia, there are often problems solving strategies that can help you when you can’t visualize. Like draw a picture.

And lots of problems don’t really require as much visual imagination as you would think. I’m pretty good at math, programming, and economics. Not top tier, but pretty good.

If there are problems out there that you struggle with compared to others, then that’s the universe telling you that you don’t have a comparative advantage in it. Do something else and hire the people who can more easily solve them if you need it.

notmyjob · 2025-10-13T20:37:14 1760387834

It sounds like you have routed around your spatial visualization deficit, but that just proves the importance of alternate cognitive strategies rather than indicate that such an aptitude or deficit doesn’t ceteris paribus impact mathematical achievement.

https://en.wikipedia.org/wiki/Spatial_visualization_ability

You probably are high g (iq), which has, historically at least, dominated other factors in determining overall outcomes.

dekhn · 2025-10-13T23:17:55 1760397475

I took some sort of IQ test when I was a kid and there was an entire section that was "if you rotate this object around that axis, it matches which of the followin g options". Try as I might, I can't picture this in my head (picturing anything other than a sphere or a cube is tough) but I found that I could look at the options and logically exclude them in a very tedious way by inspection.

It's one of the reasons I like computer graphics so much: the computer does the rotation for you! Stereo graphics (using the funny LCD glasses) was a true revelation to me, and learning how to rotate things using matrics was another.

notmyjob · 2025-10-14T17:05:23 1760461523

You must hate those fancy new style captchas where you rotate the object. I’ve never considered the fairness and discriminatory aspect of captchas until now. I wonder if in the future eternal September will finally end as increasingly complex captchas act as a sort of poll test on posting.

dekhn · 2025-10-14T21:54:52 1760478892

no, if I can see the object rotated then I can visually compare features. It's mentally rotating an object which I can't do easily.

1980phipsi · 2025-10-03T18:11:47 1759515107

Data science wasn't even a degree you could get 20 years ago. Twenty years ago if you were interested in what is now called data science, you were getting a degree with some kind of exposure to applied statistics. Economics is one of those disciplines (through econometrics).

SJC_Hacker · 2025-10-03T18:26:18 1759515978

> Data science wasn't even a degree you could get 20 years ago.

It was called statistics

lordnacho · 2025-10-03T20:25:38 1759523138

No, I did stats as part of economics around then, and it's nothing like modern DS. It overlaps a fair bit, but in practice the classical stats student is bringing a knife to a gunfight.

The practice of working with huge datasets manipulated by computers is valuable enough that you need separate training in it.

I don't know what's in a modern stats degree though, I would assume they try to turn it into DS.

jltsiren · 2025-10-03T22:46:21 1759531581

Data science is basically a marketing title given to what would have been a joint CS/statistics degree in the past. Maybe a double major, or maybe a major in one and an extensive minor in another. And it's mostly taught by people with a background in CS or statistics.

Like with most other academic fields, there is no clear separation between data science and neighboring fields. Its existence as a field tells more about the organization of undergraduate education in the average university than about the field itself.

The Finnish term for CS translates as "data processing science" or "information processing science". When I was undergrad ~25 years ago, people in the statistics department were arguing that it would have been a more appropriate name for statistics, but CS took it first. The data science perspective was already mainstream back then, as the people in statistics were concerned. But statistics education was still mostly about introductory classes of classical statistics offered to people in other fields.

mjhay · 2025-10-03T19:03:32 1759518212

No. Data science is different than statistics, because it is done on computers. It also uses machine learning algorithms instead of statistical algorithms. These advances, and the shedding of generations of restrictive cruft - frees data scientists to craft answers that their bosses want to hear - proving the superiority of data science over statistics.

mathteddybear · 2025-10-03T21:42:48 1759527768

yeah, we called that data mining, decision systems, and whatnot... mapreduce was as fresh and hot as the Paul Graham's essays book... folks were using Java over python, due to some open source library from around the globe...

essentially, provided you were at a right place in a right time, you could get a BSc in it

selimthegrim · 2025-10-04T23:53:25 1759622005

You might have missed the /s

goalieca · 2025-10-03T20:56:02 1759524962

Actuarial science perhaps

1980phipsi · 2025-10-01T20:16:15 1759349775

One of the difficulties with these models would be backtesting investment strategies. You always need to make sure that you are only using data that would have been available at the time to avoid look-ahead bias.

1980phipsi · 2025-09-25T15:43:06 1758814986

Noah Smith has had some good posts on health care costs in the US over the past year

https://www.noahpinion.blog/p/insurance-companies-arent-the-...

https://www.noahpinion.blog/p/service-costs-arent-exploding-...

stackskipton · 2025-09-25T17:16:35 1758820595

Linking blog articles that bury the lead behind paywall make it impossible to discuss anything.

However, at the core, US insurance system is the problem because it gets compounded by government trying to regulate such a system, so people do not die needlessly, but not destroy these profit seeking enterprises. So, what you end up with is a massive mess that leaves everybody cranky.

1980phipsi · 2025-09-17T16:25:16 1758126316

iTunes randomly changing album artwork happened to me too. Only thing that fixed it was wiping the iPhone and resyncing with computer.