Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle.

Seems like the kind of mistake you would make if you are not used to deploying external client facing applications.



That's pretty much the same mistake as in VW recent "We know where you parked" hack. [0] So while I don't really want to say anything nice about VW, the mistake is no something that only happens to side projects.

[0] https://www.spiegel.de/international/business/we-know-where-...


This is also something that keeps affecting "smart" software engineers with projects, that don't realise they've got misconfigured S3 buckets, or have Firebase or Mongodb etc. wide open to the world. We've seen so many companies that absolutely should know better be in this area.


The reality is that cloud providers make it easy to deploy infrastructure without much thought. You need skilled domain specific IT Architects working together to ensure that an organization's cloud presence is efficient and secure. That discipline and rigor is often dismissed or underappreciated because it forces you to slow down and decreases agility.

Some organizations have some form of Enterprise Architecture group that governs technology and ensures that there is discipline though the maturity and scope varies. I would say most organizations are completely devoid of that type of supervision and oversight.


> I would say most organizations are completely devoid of that type of supervision and oversight.

It's unfortunately far too counter to "move fast and break stuff" that startup space tends to be enamored of, because they tend to want you to do things safely and try to avoid a "Front page of the New York Times" type of security event.


Has being on the front page of the NYT for a software bug or leak killed any companies?

I think they correctly believe security failures are at most a short term PR problem as far as the market is concerned.


Sure wish it meant more than it does. Sorry that "Front page of the NYT" phrase is one I've been using since back when everyone would have expected it to be the death of a company!


Software is unfortunately a side-project for most auto makers :)


With the amount of complexity found in modern car's pre-packaged software I'd not be so sure.


No he is right, hardware manufacturers treat software as a line item and just part of the BOM. Typically just contracted out (although some are trying to change that) Thats why its typically mediocre from companies outside of SV.

You need a software first agile mentality from the leadership of the company on downwards and these legacy companies just dont have it.


VW realized that software was important years ago and founded a dedicated software-only company called Cariad to specialize in it. They went ham recruiting traditional software folks for high salaries (in European terms). I know a few people who moved Bay area -> Europe to work for them and they have a couple west coast offices where you'd expect for the people who don't want to move.

It's been an absolute disaster, with billions of dollars spent to produce delayed, buggy software.


Oh no they did write that one piece of software, which was successful until it wasn't.

https://venturebeat.com/offbeat/how-volkswagen-used-software...


Damn that last sentence was a heck of a plot twist.


As someone who worked in a software startup founded by a hardware company, it was very predictable.


Agile workflow for making cars? No.

Agile workflow for frequently updating non-critical software in devices that happen to be cars? Sure.


The problem with hardware companies is they’re bad at software because the disciplines are so different that what works for one doesn’t work for the other.

The problem with software companies is they’re bad at hardware for the same reason.

User experience companies can be good at both. Maybe not as good at hardware as a hardware company, maybe not as good at software as software companies.

Apple’s the obvious example, but Google, Garmin, heck even Starbucks are also good examples. Start with the user experience, build hardware of software or whatever else is needed. Specializing in a tool has value, but limits you to that tool.


OK, I'll bite. How is selling shitty coffee in large quantities a good example of either software or hardware excellence?


It’s neither. But they’re successful because of the user experience — consistency, the preloaded cards, the mobile ordering with notifications when your order is ready.

They build whatever hardware (in store) or software (mobile / back end) is necessary to give the user experience they want.

But you’re absolutely right — we can lump their mediocre coffee into hardware, or call it “goods” as a third category that you also don’t have to be the best at if you’re a UX company.


> software first agile mentality

I can release a website with a list of known bugs. Do any govt allow release of cars with known bugs?


There's a wide spectrum of possible bugs. I would hazard that every car ever sold was sold with known bugs.

As long as you're reasonably confident that the bugs don't pose a safety issue I don't see the problem.


Having been in automotive software development and testing for over a decade now, I assure you, it's so very much worse than even that.


The complexity is a symptom of it being a side-project, not evidence that it isn't. As a reminder, today's cars are still vulnerable to remote takeover via malformed songs on the radio because of shitty can-bus practices combined with buffer overflows in those side projects.

Safety-critical firmware is scrutinized fairly well (not because it's not a side project, but because of regulatory constraints combined with the small scope allowing the car manufacturers to treat it as a fungible good), but other software is not, even broken feedback loops interacting with that firmware.


Automotive software is worse than you can possibly imagine. It is literally some of the most broken code I have seen in my entire career and that is the industry norm. Shockingly poor. In fairness, the constraints placed on automotive software production ensure this outcome. There is no room for good practice.

If I could walk everywhere the rest of my life, I would.


There are many examples of experienced teams doing stupid things like exposing databases that I don't really think this is a valid conclusion to draw.


Right, just about 4 months ago Meta was fined for storing passwords in plain text:

https://news.ycombinator.com/item?id=41678840

The joke is these companies build systems that can tell them how to implement better security, they simply don't care.


Clearly it could never be enough to draw that conclusion but it might be very weak evidence in one direction


If something is an intelligence operation, they aren't going to screw up basic database security.


'DeepSeek is the side project of a bunch of quants'

I doubt it very much that it only was that and not massivly backed by the Chinese state in general.

As with OpenAI, much of this has to do with hype based speculation.

In the case of OpenAI they played with the speculations, that they might have AGI locked up in their labs already and fueled those speculations. The result, massive investment (now in danger).

And China and the US play a game of global hegemony. I just read articles with the essence of, see China is so great, that a small sideproject there can take down the leading players from the west! Come join them.

It is mere propaganda to me.

Now deepseek in the open is a good thing, but I believe the Chinese state is backing it up massivly to help with that success and to help shake the western world of dominance. I would also assume, the chinese intelligence services helped directly with Intel straight out of OpenAI and co labs.

This is about real power.

Many states are about to decide which side they should take, if they have to choose between West and East. Stuff like this heavily influences those decisions.

(But btw. most don't want to have to choose)


I don't buy this, simply because if the Chinese government were to back an effort, it wouldn't be Deepseek.

Alibaba has Qwen. Baidu, Huawei, Tencent, etc all have their own AI models. The Chinese government would most likely push one of these forward with their backing, not an unknown small company.


Unless of course, they want to sell the "small underdog" story.

I don't claim it is all staged. The researchers seem genuine. But they can be good researchers and still said yes at some point to big government help, if smart chinese government employes recognized their potential.


To corroborate the side project angle, their sdks are quite literally taken from openai:

  # Please install OpenAI SDK first: `pip3 install openai`
  from openai import OpenAI
  client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")


DeepSeek isn’t a side project or just a bunch of quants - these are part of the marketing that people keep repeating blindly for some reason. To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M. Anyone spending that amount of money isn’t just doing a side project.

The client facing aspect isn’t the problem here. This linked article is talking about the backend having vulnerabilities, not the client facing application. It’s about a database that is accessible from the internet, with no authentication, with unencrypted data sitting in it. High Flyer, the parent company of Deep Seek, already has a lot of backend experience, since that is a core part of the technologies they’ve built to operate the fund. If you’re a quantitative hedge fund, you aren’t just going to be lazy about your backend systems and data security. They have a lot of experience and capability to manage those backend systems just fine.

I’m not saying other companies are perfect either. There’s a long list of American companies that violate user privacy, or have bad security that then gets exploited by (often Chinese or Russian) hackers. But encrypting data in a database seems really basic, and requiring authentication on a database also seems really basic. It would be one thing if exposure of sensitive info required some complicated approach. But this degree of failure raises lots of questions whether such companies can ever be trusted.


> Anyone spending that amount of money isn’t just doing a side project.

You're reciting a bunch of absolute numbers, without any sort of context at all. $5M isn't the same for every company. For example, in 2020, it seems High Flyer spent a casual $27M on a supercomputer. They later replaced that with a $138M new computer. $5.5M sounds like something that could be like a side-project for a company like that, whose blood and sweat is literally money.

> But this degree of failure raises lots of questions whether such companies can ever be trusted.

This, I agree with though. I wouldn't trust sending my data over to them. Using their LLMs though, on my own hardware? Don't mind if I do, as long as it's better, I don't really mind what country it is imported from.


> that could be like a side-project for a company like that, whose blood and sweat is literally money.

From the mouth of Liang Wenfeng, co-founder of both High Flyer and DeepSeek, 18 months ago:

"Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this."

https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-fr...


It's a side project called DeepSeek .


5.5M is a single latest training run, if they would rent gpu-s in the cloud - from the paper.


> To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M.

This is not fair. Is OpenAI, for example, including the CEO paycheck for the model training costs?


There's a sliding scale. On one end is "Include the CEO's paycheck"; on the other is "include nothing except the price tag on the final, successful training run".

Neither end is terribly useful. Unfortunately, the $5.5M number is for the latter.


>To build DeepSeek probably requires at least a $1B+ budget.

Zero evidence that the above statement is true, and weak evidence (authors' claims) that it is false. Have you read their papers even?

https://arxiv.org/html/2412.19437v1#abstract https://arxiv.org/pdf/2501.12948


Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.

This includes salary for ~130 ML people + rest of the staff, company is 2 years old. They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.

The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.

You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.

I don't know if it really is $1B, but it certainly isn't below $100M.

[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+


H100s can cost about $30k. There was an interview with a CEO in the space speculating that they have about 50,000 H100s. That's $1.5bn. Presumably they got volume discounts, though given the export bans they might have had to pay a premium on that discount to buy them secondhand. If it were H800s, that would be ~half the price, which is still high hundreds of millions for the chips alone.

Is that true? No idea. But there isn't zero evidence.


> Between their alleged 50,000 H100 GPUs

I'm sure you were just mislead by all the people including Anthropic's Dario parroting this claim, but even Dario already said he was wrong to say that and semi analysis already clarified it was a misunderstanding of their claim, which was 50,000 H series, not 50,000 H100.


H800s, right?


Some academic projects have a lot of funding and what they are researching is some top tier stuff.

But the software? Absolute disaster.

When people say DeepSeek is a side project, this is what I assume they mean. It's different when a bunch of software engineers make something with terrible security because it's their main job. With bunch of academics (and no offense to academics), software is not their main job. You could be working on teaching them how to use version control.


You think they deliberately left their DB open to the internet, without a password? Why?


No, I did not claim that it was purposeful. But they did leave their DB open to the internet without a password. And that seems really negligent.


For an ops person yes, for a ML engineer (basically an academic) I'd be more surprised if it was secured to be honest.


doesn't even need to be a side project, or by a bunch of quants. a bunch of AI researchers working on this as their primary job would still have no real idea about what it takes to secure a large-scale world-usable internet service.


> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle

Can we stop with this nonsense ?

The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.

Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).

This is not a sideproject.

Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.


A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.

So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.


> A bunch of ML researchers who were initially hired to do quant work

Very interesting! I'm sure you have a source for this claim?

This myth of DS being a side project literally started from one tweet. DeepSeek the company is funded by a company whose main business is being a hedge fund, but DeepSeek itself from day 1 has been all about building LLM to reach AGI, completely independent.

This is like saying SpaceX is the side-project of a few caremaking bros, just because Elon funded and manages both. They are unrelated.

Again, you can easily google the name of the authors and look at their background, you will find people with PhD in LLM/multimodal models, internships at Microsoft Research etc. No trace of background on quant or time series prediction or any of that.

From the mouth of the CEO himself 2 years ago: "Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this." [0]

It's really interesting to see how after 10 years debating the mythical 10x engineer, we have now overnight created the mythical 100x Chinese quant bro researcher, that can do 50x better models than the best U.S. people, after 6pm while working on his side project.

[0]: https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-fr...


See this earlier interview from 2020.

https://www.pekingnology.com/p/ceo-of-deepseeks-parent-high-...

TDLR Highflyer started very much as exclusive ML/AI focused quant investment firm, with a lot of compute for finance AI and mining. Then CCP cracked down on mining... then finance, so Liang probably decided to pivot to LLM/AGI, which likely started as side project, but probably not anymore now the DeepSeek has taken off and Liang just met with PRC premiere a few days ago. DeepSeek being independent company doesn't mean DeepSeek isn't Liang's side project using compute bought with hedge fund money that is primarily used for hedgefund work, cushioned/allowed to get by with low margins by hedgefund profits.


Yes, see my analogy with Elon.

The point is, the team actually doing the DeepSeek work are working on this as their exclusive project, have been hired exclusively for this etc.

They aren't doing this on the side of their main quant job, and destroying U.S. researchers just as a hobby as the myth would have us believe.


That's a fair distinction. IMO should still be categorized as side project in the sense that it's Liang's pet project, the same way Jeff Bezos spend $$$ on his forever clock with seperate org but ultimately with Amazon resources. DeepSeek / Liang fixating on AGI and not profit making or loss-making since hardware / capex deprecation is likely eaten by High Flyer / quant side. No reason to believe DeepSeek spent 100ms to build out another compute chain from High Flyer. Myth that seasoned finance quants using 20% time to crush US researchers is false, but reality/narrative that a bunch of fresh out of school GenZ kids from tier1 PRC universities destroying US researchers is kind of just as embarassing.


Just to be pedantic, spaceX predates tesla


The carmaking bro predates SpaceX. He had a BMW in college and got a supercar in 1997. While he wasn’t a carmaker yet he got started with cars earlier.


A valid response to my initial comment which was a bit tongue in cheek.

However, i'm not sure that them being LLM researchers compared to quant researchers changes the dynamic of their relaxed security posture.


> However, i'm not sure that them being LLM researchers compared to quant researchers changes the dynamic of their relaxed security posture.

It does not indeed, but that's not the part I was commenting on.


First ever? Their math, coding, and other models have been making a splash since 2023.

The mythologizing around deepseek is just absurd.

"Deepseek is the tale of one lowly hedgefund manager overcoming the wicked American AI devils". Every day I hear variations of this, and the vast majority of it is based entirely in "vibes" emanating from some unknown place.


What I find amusing is that this closely mirrors the breakout moment OpenAI had with ChatGPT. They had been releasing models for quite some time before slapping the chatbot interface on it, and then it blew up within a few days.

It's fascinating that a couple of years and a few competitors in, the DeepSeek moment parallels it so closely.


Models and security are very different uses of our synapses. Publishing any number of models is no proof of anything beyond models. Talented mathematicians and programmers though they may be.


well security isn't their job to begin with


> This is not a sideproject.

OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.

The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.


None of that has anything to do with "deploying external client facing applications"


You're right. It has nothing to do with the second sentence of the two sentence post it replies to.


?? The point is, the ML researchers aren’t experts at deploying secure infrastructure.


??????

This wasn't narrow minded folks doing this. Shit happens.


It doesn't say much.

Data breaches from unsecured or accidentally-public servers/databases are not unusual among much larger entities than DeepSeek.


how many people in the world are used to deploying external client facing applications?


Hundreds of thousands. My employer alone probably has 1000.


No. I don’t think so. I think if you took many engineers and sat them at a computer and asked them to stand up a whole dev staging prod system they wouldn’t be able to do it.

I certainly would not, or it would take me a significant amount of time to do properly. I have been a full stack dev for 10 years. Now take that one step further to someone whose only interaction with a development is numpy, pandas, julia, etc…

You are, in typical HN style, minimising the problem into insignificance.

This is /not/ a “stick it behind an aws load balancer and on one of their abstracted services that does 99% of the work for you” - that would be less difficult.

E: love how this is getting ratioed by egotistical self confessed x10 engineers no doubt. Some self reflection is needed on your behalf. Just because /you/ think you would be capable, does not mean that the plethora of others would be able to.

What likely happened here is an ingress rule was set up wrongly on iptables or equivalent.. something many of your fellow engineers would have no clue about. An open dev database is rather normal if you want something out of the door quickly, why would you worry about an internal accessible only tool’s security if you trust your 10 or so staff. Have a think about the startups you have worked in (everyone here is a startup pro, just like you are - remember!) and what dire situation your mvp was in behind its smoke and mirrors PowerPoint slide deck.

Yes this was disastrous for PR. No it is not a problem solved in its entirety entirely by learned engineering experts like yourself.

Oh here. A comment from ClickHouse saying there is a legitimate reason why this will have been configured this way and happened https://news.ycombinator.com/item?id=42873446


I would consider it table stakes for an intermediate level engineer at a big company (which would have well defined processes for doing this safely) or a senior at any other company (on the assumption some of that infra has to be set up from scratch). If 10 years of experience hadn’t taught me this yet, I would personally be concerned how I’m spending my energy. I am roughly at the 10y mark, and I would estimate I have been competent enough to build a public facing application without embarrassing public access issues on my own for at least 4 years. Even before that, I would have known what to be scared of / seek help on for at least 7 years. I guess I could be more unusual than I think, but the idea that at 10 years anyone would be ok not knowing how to approach such a routine task is baffling to me.


HN is a bubble. The expectation that your colleagues are /experts like you/ is unrealistic. To stand something up like this, which is entirely on bare metal - this is a task many would find challenging if they are entirely honest with themselves and put their egos to the side. Your typical swe thinks that nothing is impossible.

There was a recent comment which said along the lines of “I used to watch figure skating, seeing them race around and spin, and think no big deal. It was only when I went on ice that I realised how difficult and impressive what they were doing was” - this is exactly the trap SWEs are most guilty of. — /this/ is what you learn as a staff level.


You are talking to the ice skaters. They expect you to do up your laces. Setting a password on a database is a something I would expect of any company capable of asking for a credit card.


everything you say is true, but I don't think any of it actually applies to being able to safely deploy user facing systems. I would certainly not trust myself to do all possible aspects of setting up a user facing system completely from scratch (ie nothing but a libc on linux or whatever) I would not trust myself to write correct crypto, for example. But I have a good sense of what I can trust myself to build relatively safely. And of course i'm not claiming that "knowledge of where to trust myself" is by any means flawless. But Even in college I made applications for people that were exposed to the public internet. But I was very aware of what I felt I could trust myself to do and what I needed to rely on some other system for. In my case I delegated auth to "sign in with google" and relied on several other services for data storage. There were features that I didn't ship because I didn't trust myself to build them safely, and I was working alone. Now I would not necessarily expect every CS student to be able to do this safely, but a healthy understanding of one's own current limitations and being willing to engineer around that as a constraint is pretty achievable, and can get you very far.


Depending on your perspective, that's either very concerning or a great business opportunity for this decade's Heroku to enter the fray.


This is definitely not something hosted on a P/SaaS.


Yes, I'm aware that most devs can't do it. I'd guess 1 in 10 can.

>An open dev database is rather normal

Not open to the internet it's not! Internal network, perhaps.

>someone whose only interaction with a development is numpy, pandas, julia, etc

This person should be aware of their limitations and give the task to someone who knows what they're doing.


> I think if you took many engineers and sat them at a computer and asked them to ...

There are many in the software engineering field which could not satisfy a request of this nature, for any reasonable form of "asked them to".


It sorta sounds like their AI would've done it better, yeah...


I don’t understand this comment? Is it unusual to request something like this? OP’s comment was saying that all 1000 or so (and hundreds of thousands of others) of his colleagues would be able to do this if asked?

I don’t know if you are in agreement with me or not


I am agreeing with your premise of asking a random s/w technician to deploy an app fairly securely would be problematic and then generalized it to include many tasks related to s/w engineering.

So we're good. :-)


How many people in the world drink coffee? I don't understand your question.


The subtext was probably "Even among professional programmers, few know what it takes to safely expose a new system to the public internet."


Right, and DeepSeek doesn't employ any because they're a bunch of quants who are used to building internal systems. I don't see how this responds to OP's point.


A lot? They can go scoop up people from any number of SaaS startups or hire an external 3rd party to do a security audit.

We're not talking some poor college students here.


That's not a matter of battle hardened experience. Publicly exposing database management endpoints that allow arbitrary execution is a *massive* no-no that even a junior developer with zero experience should be able to sense is a bad idea.


A million or more, be serious.


I am and I'm quite sure I'm not that big of a deal




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: