That's pretty much the same mistake as in VW recent "We know where you parked" hack. [0] So while I don't really want to say anything nice about VW, the mistake is no something that only happens to side projects.
This is also something that keeps affecting "smart" software engineers with projects, that don't realise they've got misconfigured S3 buckets, or have Firebase or Mongodb etc. wide open to the world. We've seen so many companies that absolutely should know better be in this area.
The reality is that cloud providers make it easy to deploy infrastructure without much thought. You need skilled domain specific IT Architects working together to ensure that an organization's cloud presence is efficient and secure. That discipline and rigor is often dismissed or underappreciated because it forces you to slow down and decreases agility.
Some organizations have some form of Enterprise Architecture group that governs technology and ensures that there is discipline though the maturity and scope varies. I would say most organizations are completely devoid of that type of supervision and oversight.
> I would say most organizations are completely devoid of that type of supervision and oversight.
It's unfortunately far too counter to "move fast and break stuff" that startup space tends to be enamored of, because they tend to want you to do things safely and try to avoid a "Front page of the New York Times" type of security event.
Sure wish it meant more than it does. Sorry that "Front page of the NYT" phrase is one I've been using since back when everyone would have expected it to be the death of a company!
No he is right, hardware manufacturers treat software as a line item and just part of the BOM. Typically just contracted out (although some are trying to change that) Thats why its typically mediocre from companies outside of SV.
You need a software first agile mentality from the leadership of the company on downwards and these legacy companies just dont have it.
VW realized that software was important years ago and founded a dedicated software-only company called Cariad to specialize in it. They went ham recruiting traditional software folks for high salaries (in European terms). I know a few people who moved Bay area -> Europe to work for them and they have a couple west coast offices where you'd expect for the people who don't want to move.
It's been an absolute disaster, with billions of dollars spent to produce delayed, buggy software.
The problem with hardware companies is they’re bad at software because the disciplines are so different that what works for one doesn’t work for the other.
The problem with software companies is they’re bad at hardware for the same reason.
User experience companies can be good at both. Maybe not as good at hardware as a hardware company, maybe not as good at software as software companies.
Apple’s the obvious example, but Google, Garmin, heck even Starbucks are also good examples. Start with the user experience, build hardware of software or whatever else is needed. Specializing in a tool has value, but limits you to that tool.
It’s neither. But they’re successful because of the user experience — consistency, the preloaded cards, the mobile ordering with notifications when your order is ready.
They build whatever hardware (in store) or software (mobile / back end) is necessary to give the user experience they want.
But you’re absolutely right — we can lump their mediocre coffee into hardware, or call it “goods” as a third category that you also don’t have to be the best at if you’re a UX company.
The complexity is a symptom of it being a side-project, not evidence that it isn't. As a reminder, today's cars are still vulnerable to remote takeover via malformed songs on the radio because of shitty can-bus practices combined with buffer overflows in those side projects.
Safety-critical firmware is scrutinized fairly well (not because it's not a side project, but because of regulatory constraints combined with the small scope allowing the car manufacturers to treat it as a fungible good), but other software is not, even broken feedback loops interacting with that firmware.
Automotive software is worse than you can possibly imagine. It is literally some of the most broken code I have seen in my entire career and that is the industry norm. Shockingly poor. In fairness, the constraints placed on automotive software production ensure this outcome. There is no room for good practice.
If I could walk everywhere the rest of my life, I would.
'DeepSeek is the side project of a bunch of quants'
I doubt it very much that it only was that and not massivly backed by the Chinese state in general.
As with OpenAI, much of this has to do with hype based speculation.
In the case of OpenAI they played with the speculations, that they might have AGI locked up in their labs already and fueled those speculations.
The result, massive investment (now in danger).
And China and the US play a game of global hegemony. I just read articles with the essence of, see China is so great, that a small sideproject there can take down the leading players from the west! Come join them.
It is mere propaganda to me.
Now deepseek in the open is a good thing, but I believe the Chinese state is backing it up massivly to help with that success and to help shake the western world of dominance. I would also assume, the chinese intelligence services helped directly with Intel straight out of OpenAI and co labs.
This is about real power.
Many states are about to decide which side they should take, if they have to choose between West and East. Stuff like this heavily influences those decisions.
I don't buy this, simply because if the Chinese government were to back an effort, it wouldn't be Deepseek.
Alibaba has Qwen. Baidu, Huawei, Tencent, etc all have their own AI models. The Chinese government would most likely push one of these forward with their backing, not an unknown small company.
Unless of course, they want to sell the "small underdog" story.
I don't claim it is all staged. The researchers seem genuine. But they can be good researchers and still said yes at some point to big government help, if smart chinese government employes recognized their potential.
DeepSeek isn’t a side project or just a bunch of quants - these are part of the marketing that people keep repeating blindly for some reason. To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M. Anyone spending that amount of money isn’t just doing a side project.
The client facing aspect isn’t the problem here. This linked article is talking about the backend having vulnerabilities, not the client facing application. It’s about a database that is accessible from the internet, with no authentication, with unencrypted data sitting in it. High Flyer, the parent company of Deep Seek, already has a lot of backend experience, since that is a core part of the technologies they’ve built to operate the fund. If you’re a quantitative hedge fund, you aren’t just going to be lazy about your backend systems and data security. They have a lot of experience and capability to manage those backend systems just fine.
I’m not saying other companies are perfect either. There’s a long list of American companies that violate user privacy, or have bad security that then gets exploited by (often Chinese or Russian) hackers. But encrypting data in a database seems really basic, and requiring authentication on a database also seems really basic. It would be one thing if exposure of sensitive info required some complicated approach. But this degree of failure raises lots of questions whether such companies can ever be trusted.
> Anyone spending that amount of money isn’t just doing a side project.
You're reciting a bunch of absolute numbers, without any sort of context at all. $5M isn't the same for every company. For example, in 2020, it seems High Flyer spent a casual $27M on a supercomputer. They later replaced that with a $138M new computer. $5.5M sounds like something that could be like a side-project for a company like that, whose blood and sweat is literally money.
> But this degree of failure raises lots of questions whether such companies can ever be trusted.
This, I agree with though. I wouldn't trust sending my data over to them. Using their LLMs though, on my own hardware? Don't mind if I do, as long as it's better, I don't really mind what country it is imported from.
> that could be like a side-project for a company like that, whose blood and sweat is literally money.
From the mouth of Liang Wenfeng, co-founder of both High Flyer and DeepSeek, 18 months ago:
"Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this."
> To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M.
This is not fair. Is OpenAI, for example, including the CEO paycheck for the model training costs?
There's a sliding scale. On one end is "Include the CEO's paycheck"; on the other is "include nothing except the price tag on the final, successful training run".
Neither end is terribly useful. Unfortunately, the $5.5M number is for the latter.
Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.
This includes salary for ~130 ML people + rest of the staff, company is 2 years old.
They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.
The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.
You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.
I don't know if it really is $1B, but it certainly isn't below $100M.
[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+
H100s can cost about $30k. There was an interview with a CEO in the space speculating that they have about 50,000 H100s. That's $1.5bn. Presumably they got volume discounts, though given the export bans they might have had to pay a premium on that discount to buy them secondhand. If it were H800s, that would be ~half the price, which is still high hundreds of millions for the chips alone.
Is that true? No idea. But there isn't zero evidence.
I'm sure you were just mislead by all the people including Anthropic's Dario parroting this claim, but even Dario already said he was wrong to say that and semi analysis already clarified it was a misunderstanding of their claim, which was 50,000 H series, not 50,000 H100.
Some academic projects have a lot of funding and what they are researching is some top tier stuff.
But the software? Absolute disaster.
When people say DeepSeek is a side project, this is what I assume they mean. It's different when a bunch of software engineers make something with terrible security because it's their main job. With bunch of academics (and no offense to academics), software is not their main job. You could be working on teaching them how to use version control.
doesn't even need to be a side project, or by a bunch of quants. a bunch of AI researchers working on this as their primary job would still have no real idea about what it takes to secure a large-scale world-usable internet service.
> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle
Can we stop with this nonsense ?
The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.
Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).
This is not a sideproject.
Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.
A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.
So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.
> A bunch of ML researchers who were initially hired to do quant work
Very interesting! I'm sure you have a source for this claim?
This myth of DS being a side project literally started from one tweet.
DeepSeek the company is funded by a company whose main business is being a hedge fund, but DeepSeek itself from day 1 has been all about building LLM to reach AGI, completely independent.
This is like saying SpaceX is the side-project of a few caremaking bros, just because Elon funded and manages both. They are unrelated.
Again, you can easily google the name of the authors and look at their background, you will find people with PhD in LLM/multimodal models, internships at Microsoft Research etc. No trace of background on quant or time series prediction or any of that.
From the mouth of the CEO himself 2 years ago: "Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this." [0]
It's really interesting to see how after 10 years debating the mythical 10x engineer, we have now overnight created the mythical 100x Chinese quant bro researcher, that can do 50x better models than the best U.S. people, after 6pm while working on his side project.
TDLR Highflyer started very much as exclusive ML/AI focused quant investment firm, with a lot of compute for finance AI and mining. Then CCP cracked down on mining... then finance, so Liang probably decided to pivot to LLM/AGI, which likely started as side project, but probably not anymore now the DeepSeek has taken off and Liang just met with PRC premiere a few days ago. DeepSeek being independent company doesn't mean DeepSeek isn't Liang's side project using compute bought with hedge fund money that is primarily used for hedgefund work, cushioned/allowed to get by with low margins by hedgefund profits.
That's a fair distinction. IMO should still be categorized as side project in the sense that it's Liang's pet project, the same way Jeff Bezos spend $$$ on his forever clock with seperate org but ultimately with Amazon resources. DeepSeek / Liang fixating on AGI and not profit making or loss-making since hardware / capex deprecation is likely eaten by High Flyer / quant side. No reason to believe DeepSeek spent 100ms to build out another compute chain from High Flyer. Myth that seasoned finance quants using 20% time to crush US researchers is false, but reality/narrative that a bunch of fresh out of school GenZ kids from tier1 PRC universities destroying US researchers is kind of just as embarassing.
The carmaking bro predates SpaceX. He had a BMW in college and got a supercar in 1997. While he wasn’t a carmaker yet he got started with cars earlier.
First ever? Their math, coding, and other models have been making a splash since 2023.
The mythologizing around deepseek is just absurd.
"Deepseek is the tale of one lowly hedgefund manager overcoming the wicked American AI devils". Every day I hear variations of this, and the vast majority of it is based entirely in "vibes" emanating from some unknown place.
What I find amusing is that this closely mirrors the breakout moment OpenAI had with ChatGPT. They had been releasing models for quite some time before slapping the chatbot interface on it, and then it blew up within a few days.
It's fascinating that a couple of years and a few competitors in, the DeepSeek moment parallels it so closely.
Models and security are very different uses of our synapses. Publishing any number of models is no proof of anything beyond models. Talented mathematicians and programmers though they may be.
OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.
The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.
No. I don’t think so. I think if you took many engineers and sat them at a computer and asked them to stand up a whole dev staging prod system they wouldn’t be able to do it.
I certainly would not, or it would take me a significant amount of time to do properly. I have been a full stack dev for 10 years. Now take that one step further to someone whose only interaction with a development is numpy, pandas, julia, etc…
You are, in typical HN style, minimising the problem into insignificance.
This is /not/ a “stick it behind an aws load balancer and on one of their abstracted services that does 99% of the work for you” - that would be less difficult.
E: love how this is getting ratioed by egotistical self confessed x10 engineers no doubt. Some self reflection is needed on your behalf. Just because /you/ think you would be capable, does not mean that the plethora of others would be able to.
What likely happened here is an ingress rule was set up wrongly on iptables or equivalent.. something many of your fellow engineers would have no clue about. An open dev database is rather normal if you want something out of the door quickly, why would you worry about an internal accessible only tool’s security if you trust your 10 or so staff. Have a think about the startups you have worked in (everyone here is a startup pro, just like you are - remember!) and what dire situation your mvp was in behind its smoke and mirrors PowerPoint slide deck.
Yes this was disastrous for PR. No it is not a problem solved in its entirety entirely by learned engineering experts like yourself.
I would consider it table stakes for an intermediate level engineer at a big company (which would have well defined processes for doing this safely) or a senior at any other company (on the assumption some of that infra has to be set up from scratch). If 10 years of experience hadn’t taught me this yet, I would personally be concerned how I’m spending my energy. I am roughly at the 10y mark, and I would estimate I have been competent enough to build a public facing application without embarrassing public access issues on my own for at least 4 years. Even before that, I would have known what to be scared of / seek help on for at least 7 years. I guess I could be more unusual than I think, but the idea that at 10 years anyone would be ok not knowing how to approach such a routine task is baffling to me.
HN is a bubble. The expectation that your colleagues are /experts like you/ is unrealistic. To stand something up like this, which is entirely on bare metal - this is a task many would find challenging if they are entirely honest with themselves and put their egos to the side. Your typical swe thinks that nothing is impossible.
There was a recent comment which said along the lines of “I used to watch figure skating, seeing them race around and spin, and think no big deal. It was only when I went on ice that I realised how difficult and impressive what they were doing was” - this is exactly the trap SWEs are most guilty of. — /this/ is what you learn as a staff level.
You are talking to the ice skaters.
They expect you to do up your laces. Setting a password on a database is a something I would expect of any company capable of asking for a credit card.
everything you say is true, but I don't think any of it actually applies to being able to safely deploy user facing systems. I would certainly not trust myself to do all possible aspects of setting up a user facing system completely from scratch (ie nothing but a libc on linux or whatever) I would not trust myself to write correct crypto, for example. But I have a good sense of what I can trust myself to build relatively safely. And of course i'm not claiming that "knowledge of where to trust myself" is by any means flawless. But Even in college I made applications for people that were exposed to the public internet. But I was very aware of what I felt I could trust myself to do and what I needed to rely on some other system for. In my case I delegated auth to "sign in with google" and relied on several other services for data storage. There were features that I didn't ship because I didn't trust myself to build them safely, and I was working alone. Now I would not necessarily expect every CS student to be able to do this safely, but a healthy understanding of one's own current limitations and being willing to engineer around that as a constraint is pretty achievable, and can get you very far.
I don’t understand this comment? Is it unusual to request something like this? OP’s comment was saying that all 1000 or so (and hundreds of thousands of others) of his colleagues would be able to do this if asked?
I don’t know if you are in agreement with me or not
I am agreeing with your premise of asking a random s/w technician to deploy an app fairly securely would be problematic and then generalized it to include many tasks related to s/w engineering.
Right, and DeepSeek doesn't employ any because they're a bunch of quants who are used to building internal systems. I don't see how this responds to OP's point.
That's not a matter of battle hardened experience. Publicly exposing database management endpoints that allow arbitrary execution is a *massive* no-no that even a junior developer with zero experience should be able to sense is a bad idea.
Seems like the kind of mistake you would make if you are not used to deploying external client facing applications.