More

thefourthchime · 2025-12-06T01:43:01 1764985381

My test of a new model is always:

"Generate a Pac-Man game in a single HTML page." -- I've never had a model been able to have a complete working game until a couple weeks ago.

Sonnet Opus 4.5 in Cursor was able to make a fully working game (I'll admit letting cursor be an agent on this is a little bit cheating). Gemini 3 Pro also succeeded, but it's not quite as good because the ghosts seem to be stuck in their jail. Otherwise, it does appear complete.

thefourthchime · 2025-12-03T01:53:41 1764726821

Isn't that what GPT 4.5 was?

wrsh07 · 2025-12-03T03:00:06 1764730806

That was a large model that iiuc was too expensive to serve profitably

Many people thought it was an improvement though

thefourthchime · 2025-11-30T06:50:30 1764485430

The price of inference has been dropping like a rock. I wouldn't expect that 2c to be true in a couple of years.

Cthulhu_ · 2025-11-30T15:26:55 1764516415

Likewise, was the cost of a Google search 20 odd years ago those amounts?

thefourthchime · 2025-11-29T17:49:17 1764438557

As a serial DIYer, I respect the engineering depth here, especially the custom vector index, but I disagree on the self-hosted ML approach. The innovation in embeddings is just too fast to keep up with locally without constant refactoring. You can actually see the trade-off in the "girl drinking water" example where one result is a clear hallucination.

warangal · 2025-11-29T18:03:01 1764439381

Currently (Semantic) ML model is the weakest (minorly fine-tuned) ViT B/32 variant, and more like acting as a placeholder i.e very easy to swap with a desired model. (DINO models have been pretty great, being trained on much cleaner and larger Dataset, CLIP was one of first of Image-text type models !).

For point about "girl drinking water", "girl" is the person/tagged name , "drinking water" is just re-ranking all of "girl"s photos ! (Rather than finding all photos of a (generic) girl drinking water) .

I have been more focussed on making indexing pipeline more peformant by reducing copies, speeding up bottleneck portions by writing in Nim. Fusion of semantic features with meta-data is more interesting and challenging part, in comparison to choosing an embedding model !

thefourthchime · 2025-11-29T17:35:06 1764437706

The article glides over the fact that FMVSS 226 is a performance standard, not a materials mandate. Manufacturers can stick with tempered glass if they beef up the side curtain airbags enough to prevent ejection, which is exactly what happens on a lot of base models and rear windows to keep BOM costs down. The list of brands using laminated glass is accurate, but it applies mostly to their premium trims or front rows only.

There is also the issue of fleet turnover. With the average age of US vehicles pushing 13 years, the install base is still overwhelmingly tempered glass. Writing off the tool entirely because new luxury cars have moved on ignores the reality of what people are actually driving. You are statistically much more likely to be trapped in a 2012 Civic than a 2025 S-Class.

alistairSH · 2025-11-29T18:30:57 1764441057

It did cover that. And half the tools couldn’t break the tempered glass either.

sndean · 2025-11-29T18:17:31 1764440251

The smartest thing to do would be to check your car’s windows for any indication (the AAA report, page 19, cited in the article has examples) of whether they’re laminated or tempered. AFAICT, whether my new-ish Subaru Ascent’s windows are laminated depends on location (front or rear) and installation differs between the Ascent trims. Best to check for your specific car and where you’re likeliest to be sitting.

walletdrainer · 2025-11-30T12:39:07 1764506347

> You are statistically much more likely to be trapped in a 2012 Civic than a 2025 S-Class.

This is probably also very much true on a per mile basis.

potato3732842 · 2025-11-30T14:37:34 1764513454

If you can afford an 2025 S-class you can afford to fly for medium distance travel, you probably aren't slogging out a long commute because you live in one of those rich inner suburbs. You leave the house at reasonable hours and get home at reasonable hours, etc, etc.

There's all sorts of stuff that's just a proxy for generalized correlation with wealth and wealthy lifestyles.

bayindirh · 2025-11-29T18:16:04 1764440164

> The article glides over the fact that FMVSS 226 is a performance standard, not a materials mandate.

Nope. The article states the following just after the table:

> It's true that not all automakers have switched over to laminated glass for the side windows; the FMVSS 226 law stipulates that you can get around it if you install elaborate side airbags that also prevent ejection.

alwa · 2025-11-30T00:36:23 1764462983

As the grandparent points out, although the article says that, the actual regulation does not. The regulation says you have to prevent side ejections, it doesn’t say how. You can read it yourself:

https://www.law.cornell.edu/cfr/text/49/571.226

> Ejection mitigation countermeasure means a device or devices, except seat belts, integrated into the vehicle that reduce the likelihood of occupant ejection through a side window opening, and that requires no action by the occupant for activation.

Lamination and side airbags seem to be the way it’s usually done today, but nothing prevents a better way.

nrklvklfl · 2025-11-30T08:00:37 1764489637

alwa · 2025-11-30T20:30:00 1764534600

One way (which the regulation mentions) is by not having a window next to a given seat in the first place

Another might be the bars or steel mesh that they weld over the windows in prisoner transport vehicles

thefourthchime · 2025-11-18T16:28:14 1763483294

I like to ask "Make a pacman game in a single html page". No model has ever gotten a decent game in one shot. My attempt with Gemini3 was no better than 2.5.

bitexploder · 2025-11-18T19:10:26 1763493026

Something else to consider. I often have much better success with something like: Create a prompt that creates a specification for a pacman game in a single html page. Consider edge cases and key implementation details that result in bugs. <take prompt>, execute prompt. It will often yield a much better result than one generic prompt. Now that models are trained on how to generate prompts for themselves this is quite productive. You can also ask it to implement everything in stages and implement tests, and even evaluate its tests! I know that isn't quite the same as "Implement pacman on an HTML page" but still, with very minimal human effort you can get the intended result.

amelius · 2025-11-18T20:28:31 1763497711

I thought this kind of chaining was already part of these systems.

bitexploder · 2025-11-19T17:46:56 1763574416

It can be, but the more specific context you can give the better, especially on your initial prompting. If it is opaque to you who knows what it is doing. Dialing in the initial spec/prompt for 5 minutes is still important. Different LLMs and models will do better or worse on this and by being a human in the loop on this initial stuff my experience is much higher quality, which indicates to me, the LLM tries, but just doesn't always have enough info to implement your intentions in many cases yet.

Workaccount2 · 2025-11-18T18:27:34 1763490454

It made a working game for me (with a slightly expanded prompt), but the ghosts got trapped in the box after coming back from getting killed. A second prompt fixed it. The art and animation however was really impressive.

ofa0e · 2025-11-18T16:36:01 1763483761

Your benchmarks should not involve IP.

sowbug · 2025-11-18T17:13:41 1763486021

The only intellectual property here would be trademark. No copyright, no patent, no trade secret. Unless someone wants to market the test results as a genuine Pac-Man-branded product, or otherwise dilute that brand, there's nothing should-y about it.

bongodongobob · 2025-11-18T18:55:41 1763492141

It's not an ethics thing. It's a guardrails thing.

sowbug · 2025-11-18T19:37:32 1763494652

That's a valid point, though an average LLM would certainly understand the difference between trademark and other forms of IP. I was responding to the earlier comment, whose author later clarified that it represented an ethical stance ("stealing the hard work of some honest, human souls").

ComplexSystems · 2025-11-18T16:42:20 1763484140

Why? This seems like a reasonable task to benchmark on.

adastra22 · 2025-11-18T16:54:06 1763484846

Because you hit guard rails.

ofa0e · 2025-11-18T16:54:24 1763484864

Sure, reasonable to benchmark on if your goal is to find out which companies are the best at stealing the hard work of some honest, human souls.

scragz · 2025-11-18T17:03:33 1763485413

correction: pacman is not a human and has no soul.

WhyOhWhyQ · 2025-11-19T01:13:36 1763514816

Why do you have to willfully misinterpret the person you're replying to? There's truth in their comment.

thefourthchime · 2025-11-14T15:29:29 1763134169

In a sense, they already do, since they're heavily invested in CoreWeave. For those unfamiliar, CoreWeave was a crypto company that pivoted to building out data centers.

zerosizedweasle · 2025-11-14T15:39:03 1763134743

It's interesting to se the market try to do anything to rally. The problem is you guys are rallying on the thought that you've scared the Fed into cutting rates, but actually by rallying you short circuit it. You ensure they won't cut. And that's how the market's lillypad hopping thinking is actually just stupidity. You rallied, so now there are no rate cuts so the crash will be even more brutal.

wmf · 2025-11-14T17:30:47 1763141447

GPU "neoclouds" are a different topic than whose logo is on the server.

thefourthchime · 2025-11-13T21:00:04 1763067604

I wonder if the senior dev actually said LLM, or at least meant LLM. If he said that, most of this checks out. The only thing is that they don't have to be stochastic, but in practice they almost always are.

thefourthchime · 2025-11-13T20:50:36 1763067036

I've been at companies as small as 10 and as large as 30,000. and there is no lack of politics in smaller companies from what I've seen.

thefourthchime · 2025-11-13T19:13:53 1763061233

Can you answer question 7?

blast · 2025-11-13T19:32:24 1763062344

I doubt that they know. It's too early to figure something like that out.

kokanee · 2025-11-13T22:19:44 1763072384

Seems to me that the obvious business model here is that they will need to have their AI inject their own ads into the DOM. Overall though, this feels like a feature, not a business.

blast · 2025-11-13T23:41:15 1763077275

To me the more obvious option is additional features that people pay for, i.e. freemium. But what do I know.

warkdarrior · 2025-11-14T00:08:07 1763078887

As a user, I'll never pay for software. Adblock for SaaS and pirated downloads for everything else is all I need.

HeinzStuckeIt · 2025-11-14T01:40:26 1763084426

Clearly there’s a tension on this venture-capital-run website between some people using their computer-nerd skills to save money and improve their experience, and other people hustling a business that requires the world to pay them.

brazukadev · 2025-11-14T11:23:37 1763119417

> Clearly there’s a tension on this venture-capital-run website

Yeah. If they have a problem with that, they can kill HN. You can't have hackers/smart people in your forum and decide what they will do. Moderation can try do guide it but there is a limit when meeting smart + polite people.

thefourthchime · 2025-11-13T20:42:36 1763066556

Or, they do know and don't want to say. This project does seem to have funding so I assume there is a plan.