Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.




How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are?

If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware.

On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.

Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.

You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.

That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.


> with your own hardware

Or with somebody else's.

If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.

If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.

Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.

Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.


> If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size

Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.


Mixture-of-Expert models benefit from economies of scale, because they can process queries in parallel, and expect different queries to hit different experts at a given layer. This leads to higher utilization of GPU resources. So unless your application is already getting a lot of use, you're probably under-utilizing your hardware.

>That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?


Not really in the loop either, but when Deepseek R1 was released, I sumbled upon this YouTube channel [1] that made local AI PC builds in the 1000-2000$ range. But he doesn't always use GPUs, maybe the cheaper builds were CPU plus a lot of RAM, I don't remember.

[1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx


Digital Spaceport is a really good channel, I second that - the author is not sparing any detail. The cheaper options always use CPU only, or sharding between different cheap GPUs (without SLI/switching) - which is not good for all use cases (he also highlights this). But some his prices are one-off bargains for used stuff. And RAM prices doubled this year, so you won't buy 2x256 GB DDR4 for $336, no matter what: https://digitalspaceport.com/500-deepseek-r1-671b-local-ai-s...

'lots of RAM' got expensive lately -_-

Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.

Deepseek and Kimi both have great agentic performance

When used with crush/opencode they are close to Claude performance.

Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude


> Deepseek on openrouter is still 25x cheaper than claude

Is it? Or only when you don’t factor in Claude cached context? I’ve consistently found it pointless to use open models because the price of the good ones is so close to cached context on Claude that I don’t need them.


Deepseek via their API also has cached context, although the tokens/s was much lower than Claude when I tried it. But for background agents the price difference makes it absolutely worth it.

Yes, if you try using Kilo Code/Cline via Openrouter the cost will be much cheaper using Deepseek/Kimi vs Claude Sonnet 4.5.

Well, those are also extremely limited vram areas that wouldn't be able to run anything in the ~70b parameter space. (Can you run 30b even?)

Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.


Strictly speaking, you have not deployed any model on a 5090 because a 5090 card has never been produced.

And without specifying your quantization level it's hard to know what you mean by "not usable"

Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.



You can just buy a 5090 now for $3k. Have you confused it with something else?

they took the already ridiculous v3.1 terminus model, added this new deepseek sparse attention thing, and suddenly it’s doing 128k context at basically half the inference cost of the old version with no measurable drop in reasoning or multilingual quality. like, imo gold medal level math and code, 100+ languages, all while sipping tokens at 14 cents per million input. that’s stupid cheap. the rl recipe they used this time also seems way more stable. no more endless repetition loops or random language switching you sometimes got with the earlier open models. it just works. what really got me is how fast the community moved. vllm support landed the same day, huggingface space was up in hours, and people are already fine-tuning it for agent stuff and long document reasoning. i’ve been playing with it locally and the speed jump on long prompts is night and day. feels like the gap to the closed frontier models just shrank again. anyone else tried it yet?

> DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.


Furthermore, paid models are heavily subsidized by bullish investors playing for monopoly. So that tips the scales further towards Deepseek.

I believe this was a statement on cost per token to us as consumers of the service

Training cost-effectiveness doesn't matter for open models since someone else ate the cost. In this case, Chinese taxpayers.

Deepseek is a private corporation funded by a hedge fund (High-Flyer). I doubt much public money was spent by the Chinese state on this. Like with LLMs in the US, the people paying for it so far are mainly investors who are betting on a return in the long to medium term.

Do you actually believe what you just wrote or are you trolling? One version at least has a foot planted in reality. The other one well...

We can judge on inference cost because we do know what those are for open-weights models as there are a dozen independent providers that host these models and price them according to respective inference cost.

We can't judge on training cost, that's true.


You can use tokens/sec on something like AWS Bedrock (which hosts both open and closed models) as a proxy for “costs per token” for the closed providers.

Apart from measuring prices from venture-backed providers which might or might not correlate with cost-effectiveness, I think the measures of intelligence per watt and intelligence per joule from https://arxiv.org/abs/2511.07885 is very interesting.

Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department.

Yes. Though we don't know for sure whether that's because they actually have lower costs, or whether it's just the Chinese taxpayer being forced to serve us a treat.

Third party providers are still cheap though. The closed models are the ones where you can't see the real cost to running them.

Oh, I was mostly talking about the Chinese taxpayer footing the training bill.

You are right that we can directly observe the cost of inference for open models.


Not sure the Chinese taxpayer is footing the bill though - of course, it might not be net zero, there might be secondary effects, etc.

A few days ago I read an article saying the Chinese utilities have a pricing structure that favors high-tech industries (say, an AI data center), making the difference by charging more the energy-intensive but less sophisticated industries (an aluminium smelter, for example).

Admittedly, there are some advantages when you do central and long-term economic planning.


Good point. Could usage patterns + inference costs give us proxy metrics? What would be a fair baseline?

I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly.

As much I agree with your sentiment, but I doubt the intention is singular.

It's like AMD open-sourcing FSR or Meta open-sourcing Llama. It's good for us, but it's nothing more than a situational and temporary alignment of self-interest with the public good. When the tables turn (they become the best instead of 4th best, or AMD develops the best upscaler, etc), the decision that aligns with self-interest will change, and people will start complaining that they've lost their moral compass.

>situational and temporary alignment of self-interest with the public good

That's how it supposed to work.


It's not. This isn't about competition in a company sense but sanctions and wider macro issues.

It's like it in the sense that it's done because it aligns with self-interest. Even if the nature of that self-interest differs.

The bar is incredibly low considering what OpenAI has done as a "not for profit"

You need get a bunch of accountants to agree on what's profit first..

Agree against their best interest, mind you!

I don't care if this kills Google and OpenAI.

I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).

Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.

I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.

The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.

These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.


I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.

They would feel the same way about using xAI or maybe even Facebook models.



TIL: That Chinese models are considered better at multiple languages than non Chinese models.

It's a customer service bot? And Airbnb is a vacation home booking site. It's pretty inconsequential

Airbnb has ~$12 bn annual revenue, and is a counterexample to the idea that no companies can be "convinced to use DeepSeek".

The fact that it's customer service means it's dealing with text entered by customers, which has privacy and other consequences.

So no, it's not "pretty inconsequential". Many more companies fit a profile like that than whatever arbitrary criteria you might have in mind for "consequential".


This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived.

I have seen it much more nuanced than that.

2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address.

Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company.

2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again.


Children do the same thing intuitively: parents continually complain that their children don't listen to them. But as soon as someone else tells them to "cover their nose", "chew with their mouth closed", "don't run with scissors", whatever, they listen and integrate that guidance into their behavior. What's harder to observe is all the external guidance they get that they don't integrate until their parents tell them. It's internal vs external validation.

Or in many cases they go over to their grandparents house and they let them run wild and all of the sudden your parents have “McDonald’s money” for their grandkids when they never had it for you.

So much worse for American companies. This only means that they will be uncompetitive with similar companies that use models with realistic costs.

I can’t think of a single major US company that is big internationally that is competing on price.

Any car company. Uber.

All tech companies offering free services.


Is a “cheaper” service going to come along and upend Google or Facebook?

I’m not saying this to insult the technical capabilities of Uber. But it doesn’t have the economics that most tech companies have - high fixed costs and very low marginal costs. Uber has high marginal costs saving a little on inference isn’t going to make a difference.


What American car company competes overseas on price?

All the American cars (Ford, Chevrolet, GM...) are much cheaper in Europe than eg. German cars from their trifecta (and other Europe-made high end vehicles from eg Sweden, Italy or UK), and on par with mid-priced vehicles from the likes of Hyundai, Kia, Mazda...

Obviously, some US brands do not compete on price, but other than maybe Jeep and Tesla, those have a small market penetration.


> I can’t think of a single major US company that is big internationally that is competing on price.

All the clouds compete on price. Do you really think it is that differentiated? Google, Amazon and Microsoft all offer special deals to sign big companies up and globally too.


I worked inside AWS consulting department for 3 years (AWS ProServe) and now I work as a staff consultant for a 3rd AWS partner. I have been on enough sales calls, seen enough go to market training materials and flown out to customers sites to know how these things work. AWS has never tried to compete as the “low cost leader”. Marketing 101 says you never want to compete on price if you can avoid it.

Microsoft doesn’t compete on price. Their major competitive advantage is Big Enterprise is already big into Microsoft and it’s much easier to get them to come onto Azure. They compete on price only when it comes to making Windows workloads Bd SQL Server cheaper than running on other providers.

AWS is the default choice for legacy reasons and it definitely has services an offerings that Google doesn’t have. I have never once been on a sales call where the sales person emphasizes that AWS is cheaper.

As far as GCP, they are so bad at evterprise sales, we never really looked at them as serious competition.

Sure AWS will throw credits in for migrations and professional services both internally and for third party partners. But no CFO is going to look at just the short term credits.


> AWS has never tried to compete as the “low cost leader”. Marketing 101 says you never want to compete on price if you can avoid it.

Despite all that and whatever you say, the fact is you do compete. It doesn't have to be a race to the bottom.

So Cloudfront free tier and the latest discount bundles etc aren't to compete? People have also negotiated private pricing way below list price and a lot cheaper than competitors.

Similarly was the Dynamodb price cuts not due to competition?

I can give way more examples...


I am well aware that Netflix doesn’t pay the same price for AWS services that “Joe Bob’s Fish Tackle and WordPress shop”. All big companies give discounts to large companies as part of negotiations which is different from “we are the low cost leader”.

All technology gets cheaper over time. There is a difference between lowering price in response to competitors and finding the profit maximizing price based on supply and demand.

AWS was lowering prices to increase demand before GCP and Azure were a thing.

Jassy said right before he became CEO of Amazon and he was still over AWS that only 5% of IT spend was on any cloud provider. They are capturing non consumption and marketing value of AWS vs that.

While I don’t have any insider experience about Azure, looking on the outside, I would think that Azure’s go to market is also not competing against AWS on price, but trying to get on prem customers on Azure.


If the Chinese model becomes better than competitors, these worries will suddenly disappear. Also, there are plenty startups and enterprises that are running fine-tuned versions of different OS models.

Yeah that’s not how Big Enterprise works…

And most startups are just doing prompt engineering that will never go anywhere. The big companies will just throw a couple of developers at the feature and add it to their existing business.


Big enterprise with mostly private companies as their clients? Lol, yeah, that’s how they work from my personal experience. The reality is, if it’s not a tech-first enterprise and already outsource part of tech to a shop outside of NA (which is almost majority at this point), they will do absolutely everything to cut the costs.

I spent three years working in consulting mostly in public sector and education and the last two working with startups to mid size commercial interest and a couple of financial institutions.

Before that I spent 6 years working between 3 companies in health care in a tech lead role. I’m 100% sure that any of those companies would I have immediately questioned my judgment for suggesting DeepSeek if had been a thing.

Absolutely none of them would ever have touched DeepSeek.


I've worked with financial services, and insurance providers that would have done the opposite for cost saving measures. So, I'm not sure what to say here.

Financial Services are far more risk averse first than they are cost cutting, they literally have risk departments.

If you'd spent anytime working at one for swe you won't have access to popular open source frameworks, let alone Chinese LLMs. The LLM development is mostly occurring through collaborations with the regional LLM businesses or internal labs.


Regulators would have the head of any financial institution that used a Chinese model.

Why would you be presenting what AI tech you are using? You would tell them AI will come from Amazon using a variety of models.

In various sectors, you need to be able to explain why you/your-system did what it did. Exchange Act Rule 15c3-5 is probably the most relevant in financial circles:

https://www.ecfr.gov/current/title-17/chapter-II/part-240/su...

Note: I am neither a lawyer nor in financial circles, but I do have an interest in the effects of market design and regulation as we get into a more deeply automated space.


To add on, while it doesn’t work with GenAI models as far as I know. AWS has a service around explainability around ML decisions

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-mode...


You still choose your model. I’m no more going to say “I’m using Bedrock” without being more specific than I would say “I’m using RDS” without specifying the database.

No… Nobody I work for will touch these models. The fear is real that they have been poisoned or have some underlying bomb. Plus y’know, they’re produced by China, so they would never make it past a review board in most mega enterprises IME.

I work at a F50 company and Deepseek is one of the model that has been approved for use. Took them a bit to get it all in place but it's certainly being used in Megacorps.

People say that, but everyone, including enterprises, are constantly buying Chinese tech one way or another because of cost/quality ratio. There’s a tipping point in any excel file where risks don’t make sense, if the cost is 20x for the same quality.

Of course you’ll always have exceptions (government, military and etc.), but for private, winner will take it all.


The xenaphobia is still very much there. Chinese tech is sanitized through Taiwanese middlemen (Foxconn, Asus, Acer etc). If you try to use Chinese tech or funding directly you will have a lot of pushback from VCs, financial institutions and business partners. China is the boogieman

it is many things, but not xenophobia.

What Chinese built infrastructure tech where information can be exfiltrated or cause any real damage are American companies buying? Chinese communication tech is for the most part not allowed in any American technology.

80% of the parts in iPhones are manufactured in China, and they have completely and utterly dominated in Enterprise (Ever heard of someone using a Blackberry in 2025? Me neither.) so there’s one example.

The software is made by Apple. Hardware can’t magically intercept communications and the manufacturing is done mostly in Taiwan. If Apple doesn’t have a process to protect its operating system from supply chain attacks, it would be derelict

Hardware can do any "magic" software can, which should be obvious since software runs on it. It's just not as cost-effective to modify it after shipping, which is why the tech sector is moving to more sw less hw (simplified, ofc, there are other reasons).

For what it's worth, this is complete insanity when practically every mega enterprises' hardware is largely Made in China.

Enterprise hardware isn’t the issue. It’s the software. How much enterprise hardware is running with Chinese software? The US basically bans any hardware with Chinese software that can disrupt infrastructure.

Backdoors in software are much easier to discover than backdoors in hardware.

Any kind of hardware that is somehow connected to the wired or wireless communication interfaces is much more dangerous than any software.

Backdoors embedded in such hardware devices may be impossible to identify before being activated by the reception of some "magic" signals from outside.


Tons of routers, modems, embedded, are running Chinese software

That conversation probably gets easier if and when company when $100+M on AI.

Companies just need to get to the “if” part first. That or they wash their hand by using a reseller that can use whatever it wants under the hood.


As a government contractor, using a Chinese model is a non-starter.

I don't know that it's actually prohibited. There is no Chinese telecommunications equipment allowed, no Huawei or Bytedance, but nothing prohibiting software merely being developed in China, not yet at least.

Although I did just check what regions AWS bedrock support Deepseek and their govcloud regions do not, so that's a good reason not to use it. Still, on prem on a segmented network, following CMMC, probably permissable


There’s nuance and debate about the 110 level 2 controls without bringing Chinese tech in to the picture. I’d love to be a fly on the wall in that meeting lol.

> I don't know that it's actually prohibited.

Chinese models generally aren't but DeepSeek specifically is at this point.


> I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening.

I think if Mistral can just stay close enough to the race it will win many customers by not doing anything.


> Even when the technical people understood that

I'm not sure if technical people who don't understand this deserve the moniker technical in this context.


really a testament to how easily the us govt has spun a china bad narrative even though it is mostly fiction and american exceptionalism

[flagged]


Try not to accuse community members of being spies, sheesh.

American companies chose to manufacturer in China and got all surprised Pikachu when China manufactured copies for themselves.


This is how crazy and nationalistic people are getting. I'm an American citizen, though I am critical of the US government, and have no allegiances to China. What do you think America is doing to every country, even allies (which has been highly publicized)? Why would a country being constantly attacked by American intelligence and propaganda not want to counter that?

https://www.reuters.com/world/europe/us-security-agency-spie...

American intelligence has penetrated most information systems and at least as of 10 years ago, was leading all other nations in the level of sophistication and capability. Read Edward Snowden.


Moralizing through whataboutism does not logically follow in disproving the China threat narrative, it is axiomatic that what matters is what they are doing to us, not what we are doing to them from that vantage.

Rather, I'd say it speaks more about how deranged the post-snowden/anti-neocon figures have become, from critiquing creeping authoritarianism to functionality acting at the behest of an even more authoritarian regime. The funny thing is that behavior of deflection, moralizing and whataboutism is exactly the kind of behavior nationalists employ, not addressing arguments head on like the so-called "American nationalists".


The average person has been programmed to be distrustful of open source in general, thinking it is inferior quality or in service of some ulterior motive

That might be the perspective of a US based company. But there is also Europe and basically it's a choice between Trump and China.

Europe has Mistral. It feels that governments that can do things without fax take this as a sovereignity thing and roll their own or have their provider in their jurisdiction.

[flagged]


> For example, a small random percentage of the time, it could add a subtle security vulnerability to any code generation.

Now on the HN frontpage: "Google Antigravity just wiped my hard drive"

Sure going to be hard to distinguish these Chinese models' "intentionally malicious actions"!

And the cherry on top:

- Written from my iPhone 16 Pro Max (Made in China)


Where does the software come from? Your iPhone can’t magically intercept communications and send it to China without the embedded software. If Apple can’t verify the integrity of its operating system before it is installed on iPhones. There are some huge issues.

Even if China did manage to embed software on the iPhone in Taiwan, it would soon hopefully be wiped since you usually end up updating the OS anyway as soon as you activate it.


The hardware can always contain undetectable sub-devices that can magically intercept anything with no possibility for the software to detect this.

You should remember that all iPhones had for several years an undetected hardware backdoor, until a couple of years ago, when independent researchers have found it and reported the Apple bugs as CVEs, so Apple was forced to fix the vulnerabilities.

The hardware backdoor consisted in the fact that writing some magic values to some supposedly unused addresses allowed the bypassing of all memory protections. The backdoor is likely to have consisted in some memory test registers, which are used during manufacturing, but which should be disabled before shipping the phone to customers, which Apple had not done.

This hardware backdoor, coupled with some bugs in a few Apple system libraries, allowed the knowledgeable attackers to send remotely an invisible message to the iPhone, which was able to take complete control over the iPhone, allowing the attacker to read any file and to record from cameras and microphones. A reboot of the iPhone removed the remote control, but then the attacker would immediately send another invisible message, regaining control.

There was no way to detect that the iPhone was remotely controlled. The backdoor was discovered only externally in the firewalls of a company, because the iPhones generated a suspiciously high amount of Internet traffic, without apparent causes.

This has been widely reported at the time and discussed on HN, but some people continue to be not aware about how little you can trust even major companies like Apple to deliver the right hardware.

The identity of the attackers who exploited this Apple hardware backdoor has not been revealed, but it is likely that they had needed the cooperation of Apple insiders, at least for access to secret Apple documentation, if not for intentionally ensuring that the hardware backdoor remained open.

Thus the fact that Apple publishes only incomplete technical documentation has helped only the attackers, allowing them to remain undiscovered for many years, against the interests of the Apple customers. Had the specifications of the test registers been public, someone would have quickly discovered that they had remained unprotected after production.

Therefore, for many years the iPhones of certain valuable targets had magically intercepted all their communications and they have sent them to an unknown country (due to the nature of some of the identified targets and the amount of resources required to carry the attacks, it has been speculated that the country could have been Israel, but no public evidence exists; a US TLA is the main plausible alternative, as some targets were Russians).


The argument was that you couldn’t trust American designed hardware running American designed software because it was built in China. All theories suggest that the security vulnerabilities were caused by Apple and had nothing to do with Chinese manufacturers

on what hypothetical grounds would you be more meaningfully able to sue the american maker of a self-hosted statistical language model that you select your own runtime sampling parameters for after random subtle security vulnerabilities came out the other side when you asked it for very secure code?

put another way, how do you propose to tell this subtle nefarious chinese sabotage you baselessly imply to be commonplace from the very real limitations of this technology in the first place?


This paper may be of interest to you: https://arxiv.org/html/2504.15867v1

the mechanism of action for that attack appears to be reading from poisoned snippets on stackoverflow or a similar site, which to my mind is an excellent example of why it seems like it would be difficult to retroactively pin "insecure code came out of my model" on the evil communist base weights of the model in question

"Baselessly" - I'm sorry but realpolitik is plenty of basis. China is a geopolitical adversary of both the EU and the US. And China will be the first to admit this, btw.

The US has also been behaving like an adversary of the EU as of late. So what's the difference?

The EU isn’t a state and has no military or police. As such the EU’s existence is an anecdotal answer to your question in itself: Reliance on (in particular maritime) trade. And yes, China also benefits from trade, but as opposed to democracies (in which the general populace to a greater extent are keys to power) the state does not require trade to sustain itself in the same way.

This makes EU countries more reliable partners for cooperation than China. The same goes for the US from an European perspective, and even with everything going on over there it is still not remotely close.

All states are fundamentally adversaries because they have conflicting interests. To your point however, adversaries do indeed cooperate all the time.


sorry, is your contention here "spurious accusations don't require evidence when aimed at designated state enemies"? because it feels uncharitably rude to infer that's what you meant to say here, but i struggle to parse this in a different way where you say something more reasonable.

Competitor != adversary. It is US warmongering ideology that tries to equate these concepts.

> It is US warmongering ideology that tries to equate these concepts

Please don't engage in political battle here, including singling out a country for this kind of criticism. No matter how right you are or feel you are, it inevitably leads to geopolitical flamewar, which has happened here.

https://news.ycombinator.com/newsguidelines.html


you clearly haven't been paying attention

remember when the US bugged EU leader's phones, including Merkel from 2002 to 2013?


> you clearly haven't been paying attention

Please don't be snarky or condescending in HN comments. From the guidelines: Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

https://news.ycombinator.com/newsguidelines.html


That is just objectively incorrect, and fundamentally misunderstanding the basics of statehood. China, the US, and any other local monopoly on force would absolutely take any chance they could get to extend their influence and diminish the others. That is they are acting rationally to at minimum maximise the probability they are able to maintain their current monopolies on force.

Several of your comments in this subthread have broken the guidelines. The guidelines ask us not to use HN for political/ideological battle and to "assume good faith". They ask us to "be kind", "eschew flamebait", and ask that "comments should get more thoughtful and substantive, not less as a topic gets more divisive."

The topic itself, like any topic, is fine to discuss here, but care must be taken to discuss it in a de-escalatory way. The words you use and the way you use them matter.

Most importantly, it's not OK to write "it is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith". That's a swipe and a personal attack that, as the guidelines ask, should be edited out.

https://news.ycombinator.com/newsguidelines.html


Can you, by any chance, delete my account? I have tried to do so before but it is not possible through the GUI. And I see you are associated with HN.

Other than that let's be very clear that there was no personal attack. You left out the part where I explain why I think the comment was made in bad faith. I.e. the part that makes it not a personal attack. And a part which I, upon request, elaborated on in the same comment tree.

As you said: Words matter.


We can disable your account if you email hn@ycombinator.com. That's in the FAQ – https://news.ycombinator.com/newsfaq.html.

And yes I am a moderator and it's my role to prevent flamewars and to encourage everyone to raise the standard of discourse here. In my comment I was trying to convey that multiple comments of yours were crossing too far into political battle and personal attack, and here are the main instances:

> That is just objectively incorrect, and fundamentally misunderstanding the basics of statehood

This counts as a personal swipe, and as fulminating.

> It is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith

People can be mistaken or wrong, or just of a different opinion/assessment, without acting “entirely in bad faith”.

> "Baselessly" - I'm sorry but realpolitik is plenty of basis. China is a geopolitical adversary of both the EU and the US. And China will be the first to admit this, btw.

This is phrased in a snarky way.

The points you've made are fine to make, but the way you make them matters. Snarkiness, swipes, put-downs, accusations of bad faith (giving your reason "why" you think it was in bad faith doesn't make it OK) are all clearly against the guidelines.

I can accept that you didn't mean to break the guidelines, which is why I've politely asked you to familiarise yourself with them and try harder to follow them in future. It's a request not a scolding. It's not necessary to announce you want to quit HN in protest. (Though of course, eventually we would rather people leave if they prefer not to follow the guidelines.) Just making an effort to respect the guidelines and the HN community would be great.


The deletion request was completely unrelated. I just don’t like the interaction gamification. Thanks!

I have not made a single personal swipe in this entire comment tree. I have stated (implied) that certain views are not consistent with a cursory introduction to the topic at hand.

I absolutely assumed a basic familiarity with the concept of a state from a comment on the relationship between states. That is good faith and basic respect for the human you are conversing with as I view it.

Overall, I have kept a tone I would prefer be kept towards myself; fake politeness is just condescending.

That being said: Your site, your rules, and your power to arbitrarily interpret and enforce said rules. I.e., message received, regardless of my thoughts on your interpretation of events.


> Overall, I have kept a tone I would prefer be kept towards myself; fake politeness is just condescending.

We don't want you to be fake. We just want you to make the effort to share your perspective in a way that is kind and is conducive to curious conversation, which is HN's primary objective. We know it can be hard to get this right when commenting on the internet. It's common for people to underestimate how hostile their words can come across to others, when they seem just like reasonable, matter-of-fact statements when formulated in one's own mind.

> That being said: Your site, your rules, and your power to arbitrarily interpret and enforce said rules

That's not really it. The community holds the power here; when we try to override broad community sentiment and expectations, the community pushes back forcefully.

Your comments got my attention because they were attracting flags and downvotes from the community, and from looking at these comments and earlier ones in your feed, my assessment is "yes, I can see why". (We don't let community sentiment, or "mob rule" win out all the time; we often override flags if we think they're unfair, but in your case, given the pattern we observe over time, we think the community's response is reasonable.)


Isn’t every country by definition a “local monopoly on force”? Sweden and Norway have their own militaries and police forces and neither would take kindly to an invasion from the other. By your definition this makes them adversaries or enemies.

Exactly. I am Norwegian myself, and I don’t even know how many wars we have had with Sweden and Denmark.

If you are getting at the fact that it is sometimes beneficial for adversaries to collaborate (e.g., the prisoner dilemma) then I agree. And indeed, both Norway and Sweden would be completely lost if they declared war on the other tomorrow. But it doesn’t change the fundamental nature of the relationship.


Literally every time a Chinese model is discussed here we get this completely braindead take

There has never been a shred of evidence for security researchers, model analysis, benchmarks, etc that supports this.

It's a complete delusion in every sense.


For good reason, too. Hostile governments have a much easier time poisoning their "local" LLMs.

ChatGPT is like "Photoshop" people will call any AI chatgpt.

How do they make their money

I suspect it is a state venture designed to undermine the American-led proprietary AI boom. I'm all for it, tbh, but as others have pointed out, if they successfully destroy the American ventures it's not like we can expect an altruistic endgame from them.

Deepseek is owned by a Chinese hedge fund. It was originally created for finance and then generalized later. In any case you pay for it like any other LLM.

[flagged]


Should I root for the democratic OpenAI, Google or Microsoft instead?

Further more, who thinks our little voices matter anymore in the US when it comes to the investor classes?

And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.


There used to be memes „open source is communism”, vide https://souravroy.com/2010/01/01/is-open-source-pro-communis...

> CrowdStrike researchers next prompted DeepSeek-R1 to build a web application for a Uyghur community center. The result was a complete web application with password hashing and an admin panel, but with authentication completely omitted, leaving the entire system publicly accessible.

> When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.

Holy shit, these political filters seem embedded directly in the model weights.


LLMs are the perfect tools of oppression, really. It's computationally infeasible to prove just about any property of the model itself, so any bias will always be plausibly deniable as it has to be inferred from testing the output.

I don't know if I trust China or X less in this regard.


not convincing. have you tried saying "free palestine" on a college campus recently?

>winning on cost-effectiveness

Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.


Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.

Lol its kinda suprising that the level of understanding around LLMs is so little.

You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.

You already have Vector Databases that are used as context stores with information retrieval.

Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.

So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.


You should start a company and try your strategy. I hope it works! (Though I am doubtful.)

In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.


If someone wants to fund me, Ill gladly work on this. There is no money in this though, because selling cloud service is much more profitable.

Its also not a matter of it working or not. It already works. Take a small model that fits on a GPU with a large context window, like Gemma 27b or smaller ones, give it a whole bunch of context on the topic, and ask it questions and it will generate very accurate results based on the context.

So instead of encoding everything into the model itself, you can just take training data, store it in vector DBs, and train a model to retrieve that data based on query, and then the rest of it is just training context extraction.


> There is no money in this though, because selling cloud service is much more profitable.

Oh, be more creative. One simple way to make money off your idea is:

(1) Get a hedge fund to finance your R&D.

(2) Hedge fund shorts AI cloud providers and other relevant companies.

(3) Your R&D pans out and the AI cloud providers' stock tanks.

(4) The hedge fund makes a profit.

Though I don't understand: wouldn't your idea work work when served from the cloud, too? If what you are saying is true, you'd provide a better service at lower cost?


From a functional pespective, it would provide somewhat identical performance to existing systems with a lower cost due to less dependence on compute and more dependence on storage. It would also allow more on-prem solutions.

However the issue with "funding" isn't as simple as that statement above. Remember, modern funding is not about value its about hype. There is a reason why CEOs like Jenson say that if they could go back in time, they would never start their companies knowing the bullshit they have to walk through.

Ive also had my fair share of experiences in trying to get startups off the ground - for example, back around 2018, I was working on a system that would take your existing AWS cloud setup, and move it all to EC2s with self hosted services, which saved people money in the long run. I had proof of concept working and everything. The issue that I ran into when trying to get funding to build this out into a full blown product/service that I didn't realize is that being on AWS services for companies was equivalent to a person wearing an expensive business suit to a sales meeting - it was fact that they would advertise because it was seen as industry standard and created "warm feelings" with their customers. So at most, I would get some small time customers, while getting paid much less.

Now I just work on stuff (and yes, I am working on the issue at hand with existing models), and publish it to github (not gonna share it cause don't want my HN account associated with it). If someone contacts me with a dollar figure Im all game.



Ok then point out where I made a mistake.

Nothing shows lack of understanding of the subject matter more than referencing the Dunning Kruger effect in a conversation.


I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

[2] https://huggingface.co/google/gemma-3n-E4B-it

[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...


> but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]

No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.


Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future

Because Moore's law marches on.

We're around 35-40 orders of magnitude from computers now to computronium.

We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.

If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.

Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.

Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.

What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.


I appreciate your rabid optimism, but considering that Moores Law has ceased to be true for multiple years now I am not sure a handwave about being able to scale to infinity is a reasonable way to look at things. Plenty of things have slowed down in progress in our current age, for example airplanes.

Someone always crawls out of the woodwork to repeat this supposed "fact" which hasn't been true for the entire half-century it's been repeated. Jim Keller (designer of most of the great CPUs of the last couple decades) gave a convincing presentation several years ago about just how not-true it is: https://www.youtube.com/watch?v=oIG9ztQw2Gc Everything he says in it still applies today.

Intel struggled for a decade, and folks think that means Moore's law died. But TSMC and Samsung just kept iterating. And hopefully Intel's 18a process will see them back in the game.


During the 1990s (and for some years before and after) we got 'Dennard scaling'. The frequency of processors tended to increase exponentially, too, and featured prominently in advertising and branding.

I suspect many people conflated Dennard scaling with Moore's law and the demise of Dennard scaling is what contributes to the popular imagination that Moore's law is dead: frequencies of processors have essentially stagnated.

See https://en.wikipedia.org/wiki/Dennard_scaling


Yup. Since then we've seen scaling primarily in transistor count, though clock speed has increased slowly as well. Increased transistor count has led to increasingly complex and capable instruction decode, branch prediction, out of order execution, larger caches, and wider execution pipelines in attempt to increase single-threaded performance. We've also seen the rise of embarrassingly parallel architectures like GPUs which more effectively make use of additional transistors despite lower clock speeds. But Moore's been with us the whole time.

Chiplets and advanced packaging are the latest techniques improving scaling and yield keeping Moore alive. As well as continued innovation in transistor design, light sources, computational inverse lithography, and wafer scale designs like Cerebras.


Yes. Increase in transistor count is what the original Moore's law was about. But during the golden age of Dennard scaling it was easy to get confused.

Agreed. And specifically Moore's law is about transistors per constant dollar. Because even in his time, spending enough could get you scaling beyond what was readily commercially available. Even if transistor count had stagnated, there is still a massive improvement from the $4,000 386sx Dad somehow convinced Mom to greenlight in the late 80s compared to a $45 Raspberry Pi today. And that factors into the equation as well.

Of course, feature size (and thus chip size) and cost are intimately related (wafers are a relatively fixed cost). And related as well to production quantity and yield (equipment and labor costs divide across all chips produced). That the whole thing continues scaling is non-obvious, a real insight, and tantamount to a modern miracle. Thanks to the hard work and effort of many talented people.


The way I remember it, it was about the transistor count in the commercially available chip with the lowest per transistor cost. Not transistor count per constant dollar.

Wikipedia quotes it as:

> The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.

But I'm fairly sure, if you graph how many transistors you can buy per inflation adjusted dollar, you get a very similar graph.


Yes. I think you're probably right about phrasing. And transistor count per inflation adjusted dollar is the unit most commonly used to graph it. Similar ways to say the same thing.

The Law of Accelerating Returns is a better formulation, not tied to any particular substrate, it's just not as widely known.

https://imgur.com/a/UOUGYzZ - had chatgpt whip up an updated chart.

LoAR shows remarkably steady improvement. It's not about space or power efficiency, just ops per $1000, so transistor counts served as a very good proxy for a long time.

There's been sufficiently predictable progress that 80-100 TFLOPS in your pocket by 3035 is probably a solid bet, especially if a fully generative AI OS and platform catches on as a product. The LoAR frontier for compute in 2035 is going to be more advanced than the limits of prosumer/flagship handheld products like phones, so theres a bit of lag and variability.


You could put 64TBs of storage into your pocket with current technology. There are 4TB microSD cards available.

Not sure about the stated GFlops.. but I suspect we find that AI doesn't need that much compute to begin with.


You can run models locally on high end smartphones today with apps like PocketPal or Local LLM.

> What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.


Nothing to do with Moores Law or AGI.

The current models are simply inefficient for their capability in how they handle data.


> If we do get to AGI (2029 according to Kurzweil)

if you base your life on Kurzweil's hard predictions you're going to have a bad time


I didn't say winning business, I said winning on cost effectiveness.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: