4.6M is not a lot, and these were old bugs that it found. Also, actually exploiting these bugs in the real world is often a lot harder than just finding the bug. Top bug hunters in the Ethereum space are absolutely using AI tooling to find bugs, but it's still a bit more complex than just blindly pointing an LLM at a test suite of known exploitable bugs.
According to the blogpost, these are fully autonomous exploits, not merely discovered bugs. The LLM's success was measured by much money it was able to extract:
>A second motivation for evaluating exploitation capabilities in dollars stolen rather than attack success rate (ASR) is that ASR ignores how effectively an agent can monetize a vulnerability once it finds one. Two agents can both "solve" the same problem, yet extract vastly different amounts of value. For example, on the benchmark problem "FPC", GPT-5 exploited $1.12M in simulated stolen funds, while Opus 4.5 exploited $3.5M. Opus 4.5 was substantially better at maximizing the revenue per exploit by systematically exploring and attacking many smart contracts affected by the same vulnerability.
They also found new bugs in real smart contracts:
>Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694.
Of course they are, and they've been doing it since long before ChatGPT or any of that was a thing. Before it was more with classifiers and concolic execution engines, but it's only gotten way more advanced.
State is globally distributed, and smart contract code executes state transitions on that state. When someone submits a transaction with certain function parameters, anyone can verify that those parameters will lead to that exact state transition.
I talked to the son at one of the early (~2008) YC dinners. Actually found him more approachable than PG or most YC founders; RTM is a nerd in the "cares a whole lot about esoteric mathematics" way, which I found a refreshing change from the "take over the world" vibe that I got from a lot of the rest of YC.
Interesting random factoid: RTM's research in the early 2000s was on Chord [1], one of the earliest distributed hash tables. Chord inspired Kademlia [2], which later went on to power Limewire, Ethereum, and IPFS. So his research at MIT actually has had a bigger impact in terms of collected market cap than most YC startups have.
RTM Jr is a very nice person, obviously very smart, but also has a good sense of humor and is friendly and approachable. We overlapped as C.S. grad students at Harvard for several years.
I did not. That actually makes everything make much more sense. I was even wordering how he got out of jail time for something like this and just thought he had amazing lawyers.
I think the bigger thing was that the Internet just wasn't that big a deal at the time. I got serious access in '93, and into '94-95 there were still netsplits on it (UUNet/NSFNet is the one I remember most). It was a non-remunerative offense, with really unclear intent, that took out a research network. He had good counsel, as you can tell from the reporting about the trial, but the outcome made sense. I doubt his dad had much to do with it.
Yeah, in 1988 the Internet appeared like a research network that connected universities. No money was directly at stake and the systems harmed didn't appear critical. Related to what Thomas says above, part of the response to the incident was to partition the Internet for a few days [2] - I don't know if such a thing would be possible now.
But looking into the specifics again after all these years [1], I read:
"The N.S.A. wanted to clamp a lid on as much of the affair as it could. Within days, the agency’s National Computer Security Center, where the elder Morris worked, asked Purdue University to remove from its computers information about the internal workings of the virus."
and that CERT at CMU was one response to the incident [2].
So there is a whiff of the incident being steered away from public prosecution and towards setting up security institutions.
Robert Morris did get a felony conviction, three years probation, and a $10K fine. As for hn users, aside from pg, Cliff Stoll has a minor role in the story.
> I think the bigger thing was that the Internet just wasn't that big a deal at the time.
Maybe I’m just getting old, but it seems like nothing was such a big deal at the time.
Everything seems to have gotten more uptight in the last few decades. I used to have a metal cutlery set that an international airline gave to every passenger on the plane.
From what I can remember, while there was some public awareness of "computer crime" by 1988 (War Games helped with that), it wasn't exactly a "big deal" to most people yet. My subjective recollection is that things took a marked turn around 1990, with the advent of "Operation Sundevil"[1], the raid on Steve Jackson Games, etc.
And by the mid to late 90's (I'd say about 1997) it was finally becoming "received wisdom" to most hacker that "this is real now: getting caught doing this stuff could mean actual jail time, fines, not getting into college, losing jobs, etc." Now I grew up in a rural part of NC and so we probably lagged other parts of the country in terms of information dispersal, so I expect other people view the timeline differently, so YMMV.
Lots of chaos, but just three arrests. Did any of them proceed to full prosecutions? I'm reasonably sure Bruce Esquibel wasn't charged (at least, there's nothing in PACER to say so). I have no idea who "Tony The Trashman" was.
Barely. In my area around that time, teenagers were causing havoc by breaking into local colleges just so they could get onto IRC and access FTP sites. "Network security" was a pretty new concept.
Ehh? It had only recently been made explicitly criminal by federal statute. If you're thinking of "the Hacker Crackdown" that occurred a few years after the Morris Worm, or of Kevin Mitnick's exploits, it's worth keeping in mind that they were doing pretty crazy shit even relative to today; they were owning up phone switches across the country. And despite that, the penalties were not crazy high.
What you didn't have back then was financial fraud on the scale that happens today, where even nominal damages run into 8-9 figures.
Exactly. It's so wild to me when people hate on generated text because it sounds like something they don't like, when they could easily tell it to set the tone to any other tone that has ever appeared in text.
> If the AI PR were any good, it wouldn’t need review.
So, your minimum bar for a useful AI is that it must always be perfect and a far better programmer than any human that has ever lived?
Coding agents are basically interns. They make stupid mistakes, but even if they're doing things 95% correctly, then they're still adding a ton of value to the dev process.
Human reviewers can use AI tools to quickly sniff out common mistakes and recommend corrections. This is fine. Good even.
> So, your minimum bar for a useful AI is that it must always be perfect and a far better programmer than any human that has ever lived?
You are transparently engaging in bad faith by purposefully straw manning the argument. No one is arguing for “far better programmer than any human that has ever lived”. That is an exaggeration used to force the other person to reframe their argument within its already obvious context and make it look like they are admitting they were wrong. It’s a dirty argument, and against the HN guidelines (for good reason).
> Coding agents are basically interns.
No, they are not. Interns have the capacity to learn and grow and not make the same mistakes over and over.
I strongly disagree that it was bad faith or strawmanning. The ancestor comment had:
> This makes no sense, and it’s absurd anyone thinks it does. If the AI PR were any good, it wouldn’t need review. And if it does need review, why would the AI be trustworthy if it did a poor job the first time?
This is an entirely unfair expectation. Even the best human SWEs create PRs with significant issues - it's absurd by the parent to say that if a PR is "any good, it wouldn’t need review"; it's just an unreasonable bar, and I think that @latexr was entirely justified in pushing back against that expectation.
As for the "95% correctly", this appears to be a strawman argument on your end, as they said "even if ...", rather than claiming that this is the situation at the moment. But having said that, I would actually like to ask both of you - what does it even mean for a PR to be 95% correct - does it mean that that 95% of the LoC are bug-free, or do you have something else in mind?
I pay OpenAI $200 a month, and use Codex all the time, but just installed the crappy ChatGPT app for Android, and just use it from the mobile web browser, because it's over a month behind on super common features that launched on iPhone on day one.
Same thing with Sora 2 being Apple only. What craziness is that? Why are developers leaning so hard into supporting closed source ecosystems and leaving open source ecosystems behind?
This never has anything to do with open source vs. closed source, or anything like that. It always has to do with prioritizing the cohort that's most likely to pay money.
It's been shown over and over again in A/B testing that Apple device users will pay higher prices for the same goods and services than non-Apple users will. They're more likely to pay, period, versus free-ride.
As an Android user, it frustrates me sometimes. But I understand. I'm far more frugal with my online spending than most of my Apple user friends, myself.
What's with the assumption that everything needs to be a "moat"? Seems much more important/interesting to wire up society with cohesive tooling according to Metcalfe's law, rather than building stuff designed to separate and segment knowledge.
Information security is, fundamentally, a misalignment of expected capabilities with new technologies.
There is literally no way a new technology can be "secure" until it has existed in the public zeitgeist for long enough that the general public has an intuitive feel for its capabilities and limitations.
Yes, when you release a new product, you can ensure that its functionality aligns with expectations from other products in the industry, or analogous products that people are already using. You can make design choices where a user has to slowly expose themselves to more functionality as they understand the technology deeper, but each step of the way is going to expose them to additional threats that they might not fully understand.
Security is that journey. You can just release a product using a brand new technology that's "secure" right out of the gate.
I'm sorry but that's a pathetic excuse for what's going on here. These aren't some unpredictable novel threats that nobody could've reasonably seen coming.
Everyone who has their head screwed on right could tell you that this is an awful idea, for precisely these reasons, and we've known it for years. Maybe not their users if they haven't been exposed to LLMs to that degree, but certainly anyone who worked on this product should've known better, and if they didn't, then my opinion of this entire industry just fell through the floor.
This is tantamount to using SQL escaping instead of prepared statements in 2025. Except there's no equivalent to prepared statements in LLMs, so we know that mixing sensitive data with untrusted data shouldn't be done until we have the technical means to do it safely.
Doing it anyway when we've known about these risks for years is just negligence, and trying to use it as an excuse in 2025 points at total incompetence and indifference towards user safety.