Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AI moves so fast that Vibe Coding still has a negative stigma attached to it, but even after 25 years of development, I'm not able to match the productivity of getting AI to implement the features I want. It's basically getting multiple devs to set out and go do work for you where you just tell them what you want and provide iterative feedback till they implement all the features you want, in the way you want and to fix all the issues you find along the way, which they can create tests and all the automated and deployment scripts for.

This is clearly the future of Software Development, but the models are so good atm that the future is possible now. I'm still getting used to and having to rethink my entire dev workflow for maximum productivity, and whilst I wouldn't unleash AI Agents on a decade old code base, all my new Web Apps will likely end up being AI-first unless there's a very good reason why it wouldn't provide a net benefit.



It just depends on what you are doing. A green field react app in typescript with a CRUD API behind? The LLMs are a mind blowing assistant and 1000t/s is crazy.

You are doing embedded development or anything else not as mainstream as web dev? LLMs are still useful but no longer mind blowing and often produce hallucinations. You need to read every line of their output. 1000t/s is crazy but no longer always in a good way.

You are doing stuff which the LLMs haven't seen yet? You are on your own. There is quite a bit of irony in the fact that the devs of llama.cpp barely use AI - just have a look at the development of support for Qwen3-Next-80B [1].

[1] https://github.com/ggml-org/llama.cpp/pull/16095


> You are doing embedded development or anything else not as mainstream as web dev?

Counterpoint, but also kind of reinforcing your point. It depends on the kind of embedded development. I did a small utility PCB with an ESP32, and their libs are good there is active community, they have test frameworks. LLM's did a great job there.

On the other hand, I wanted to drive a timer and a pwm module and a DMA dma engine to generate some precise pulses. The way I chained hw was... Not typical, but it was what I needed and the hw could do it. At that, Claude failed miserably and it only led me to waste my time, so I had to spend the time to do it manually.


> You are doing embedded development or anything else not as mainstream as web dev? LLMs are still useful but no longer mind blowing and often produce hallucinations.

I experienced this with Claude 4 Sonnet and, to some extent, gpt-5-mini-high.

When able to run tests against its output, Claude produces pretty good Rust backend and TypeScript frontend code. However, Claude became borderline unproductive once I started experimenting with uefi-rs. Other LLMs, like gpt-5-mini-high, did not fare much better, but they were at least capable of admitting lack of knowledge. In particular, GPT-5 would provide output akin to "here is some pseudocode that you may be able to adapt to your choice of UEFI bindings".

Testing in a UEFI environment is quite difficult; the LLM can't just run `cargo test` and verify its output. Things get worse in embedded, because crates like embedded_hal made massive API changes between 0.2 and 1.0 (the latest version), and each LLM I've tried seems to only have knowledge of 0.2 releases. Also, for embedded, forget even thinking about testing harnesses (which at least exist in some form with UEFI, it's just difficult to automate the execution and output for an LLM). In this case, you cannot really trust the output of the LLM. To minimize risk of hallucination, I would try maintaining data sheets and library code in context, but at that point, it took more time to prompt an LLM than handwrite code.

I've been writing a lot of embedded Rust over the past two weeks, and my usage of LLMs in general decreased because of that. Currently planning to resume development on some of my "easier" projects, since I have about 300 Claude prompts remaining in my Zed subscription, and I don't want them to go to waste.


This is where Rust's "if it compiles, it's probably correct" philosophy may come in handy.

"Shifting bugs left" is even more important for LLMs than it is for humans. There are certain tests LLMs can't run, so if we can detect bugs at compile time and run the LLM in a loop until things compile, that's a significant benefit.


My recent experience is that llms are dogshit at rust, though, unable to correct bugs without inserting new ones, going back and forth fixing and breaking the same thing, etc.


A while ago I gathered every HN comment going back a year that contains Rust and LLM and about half are positive and half are negative.


Sounds like the general "LLMs are net useful or not" sentiment here too. Personally Rust+LLMs work great, and workflow is rapid for as long as you can get the LLM to run one command to say "good or bad" without too much manually work, then it can iterate until it all works. Standard advice for prompting like "Don't make tests pass by changing assertions" tends to make the experience better too, but that's not Rust specific either.


Aren’t we all though?


> Also, for embedded, forget even thinking about testing harnesses (which at least exist in some form with UEFI, it's just difficult to automate the execution and output for an LLM).

I think this doesn't have to be like this and we can do better for this. If LLMs keep this up, good testing infrastructure might become more important.


One of my expectations for the future is the development of testing tools whose output is "optimized" in some way for LLM consumption. This is already occurring with Bun's test runner, for instance.[0] They are implementing a flag in the test runner so that the output is structured and optimized for token count.

Overall, I agree with your point. LLMs feel a lot more reliable when a codebase has thorough, easy-to-run tests. For a similar reason, I have been drifting towards strong, statically-typed languages. Both Rust and TypeScript have rich type systems that can express many kinds of runtime behavior with just types. When a compiler can make strong guarantees about a program's behavior, I assume that helps nudge the quality of LLM output a bit higher. Tests then help prevent silly regressions from occurring. I have no evidence for this besides my anecdotal experience using LLMs across several programming languages.

In general, I've had the best experience with LLMs when there's plenty of static analysis (and tests) on the codebase. When a codebase can't be easily tested, then I get much less productivity gains from LLMs. So yeah, I'm all for improving testing infrastructure.

[0] https://x.com/jarredsumner/status/1944948478184186366


There aren't many things that LLMs haven't really seen yet, however. I have successfully used LLMs to develop a large portion of WebAssembly 3.0 interpreter [1], which surely aren't in their training set because WebAssembly 3.0 was only released months ago. Sure, it took me tons of guidance but it was useful enough for me.

Even llama.cpp is not a truly novel thing to LLMs, there are several performant machine learning model executors available in their training sets anyway, and I'm sure llama.cpp can benefit from LLMs if they want; they just chose not to.

[1] https://github.com/lifthrasiir/wah/


I've said it before but no one takes it seriously: LLMs are only useful if you're building something that's already in the training set ie already commodity. In which case why are you building it???


The obvious point that you're missing is that there are literally infinite ways to assemble software systems from the pieces that an LLM is able to manipulate due to its training. With minor guidance, LLMs can put together an unlimited number of novel combinations. The idea that the entire end product has to be in the training set is trivially false.


It's not that the product you're building is a commodity. It's that the tools you're using to built it are. Why not build a landing page using HTML and CSS and tailwind? Why not use swift to make an app? Why not write an AWS lambda using JavaScript?


"LLMs are only useful..."

Is likely why no one takes you seriously, as it's a good indication you don't have much experience with them.


It's true, when I was working with LLMs on a novel idea it said sorry I can't help you with that!


Do you avoid writing anything that the programming community has ever built? How are you alive???


Bell Labs should have fired all their toilet cleaners. Nothing innovative about a toilet.


Historically big AI skeptic here: what you say is very not true now. LLMs aren't just regurgitating their training data per se. I've used LLMs on languages the LLM has not seen, and it performed well. I've used LLMs on code that is about as far from a React todo app as it's possible to get.


Because I'm getting paid to.


We need a new term for LLMs actually solving a hard problems. When I help Claude Code solve a nasty bug it doesn’t feel like “vibing” as in “I tell the model what I want the website to look like”. It feels like sniping as in “I spot for Claude Code, telling how to adjust for wind, range, and elevation so it can hit my far away target”.


From what I recall of the original Karpathy definition, it’s only “vibe coding” if you aren’t reading the code it produces


Yes, I vote for keeping that definition and not throw it all into a box. LLM assisted coding is not vibe coding.


My point exactly, it is not vibe coding so it should not be called vibe coding. What should we call it then?


LLM-assisted Development. Something that for me works in practice, vibe-coding never did, you really need to carefully review and steer constantly if things are to work out longer than just a few features.


You’re right. It’s explicitly about not caring about the code:

> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

https://x.com/karpathy/status/1886192184808149383


Cool, I did not know that. That makes perfect sense.


So we're the spotter in that metaphor. I like it!


"spotter coding" or perhaps "checker coding"?

"verified vivisection development" when you're working with older code :D


- backseat engineer

- keyboard princess

- Robin to the Batman

- meatstack engineer

- artificial manager


Keyboard Princess is good, Artificial Manager is even better.


> AI moves so fast that Vibe Coding still has a negative stigma attached to it

As it should, it's about writing code on vibes, not looking at the code, it's literally the definition of the term:

https://x.com/karpathy/status/1886192184808149383

And when I say literally I'm including dictionaries:

https://blog.collinsdictionary.com/language-lovers/collins-w...


The industry of "software" is so large... While I agree with web development going this route, I'm not sure about "everything else".

You could argue that that's the bulk of all software jobs in tech, and you'd likely be correct... But depending on what your actual challenge is, LLM assistance is more of a hindrance then help. However creating a web platform without external constraints makes LLM assistance shine, that's true


Well, there are certainly kinds of code LLMs would struggle with, but people generally underestimate what LLMs are capable of.

E.g. Victor Taelin is implementing ultra-advanced programming language/runtime writing almost all code using LLM now. Runtime (HVM) is based on Interaction Calculus model which was only an obscure academic curiosity until Taelin started working on it. So a hypothesis that LLMs are only capable of copying bits of code from Stack Overflow shall be dismissed.


I took a look at the Taelin's work [1].

[1] https://github.com/HigherOrderCO/HVM

From my understanding, main problem there is a compilation into (optimal) CUDA code and CUDA runtime, not language or internal representation per se. CUDA is hard to debug, some help can be warranted.

BTW, this HVM thing smells strange. The PAPER does not provide any description of experiments where linear parallel speedups were achieved. What were these 16K cores? What were these tasks?


Taelin is experimenting with possible applications of interaction calculus. That CUDA thing was one of experiments, and it didn't quite work out.

Currently he's working on a different thing: a code synthesis tool. AFAIK he got something better than anything else in this category, but whether it's useful is another question.


  > something better than anything else in this category
That is a strong statement.

[1] https://en.wikipedia.org/wiki/Id_(programming_language)

Id [1] was run on the CM-5 (then) supercomputer and demonstrated superlinear parallel speedups on some of the tasks. That superlinear speedup was due to better cache utilization on individual nodes.

In some of the tasks the amount of parallel execution discovered by Id90 would lead to overflow of content-addressable memory and Id90's runtime implemented throttling to reduce available parallelism to make things to be done at all.

Does the PAPER of HVM refers to Id (Id90 to be precise)? No, it does not.

This is serious negligence of Taelin.


I mean their code synthesis thing (i.e. "generate function code from input-output pairs") is better than anything from academia by many orders of magnitude. But it's not published yet.

I dunno why you choose to be so critical. Taelin isn't really selling his stuff, and his previous stuff was just an open source experiment. It's not an academic paper which claims something. And HVM1 is irrelevant now.

Their new stuff isn't about general performance but code synthesis


I’ve also experimented with using rust to create a new programming language where I vibe coded (eg never wrote myself). My opinion is that it’s quite capable with disciplined management.

https://github.com/GoogleCloudPlatform/aether

Note: the syntax is ugly as a trade-off to make it explicit and unambiguous for LLMs to use.


Do you think that people will read your flowery prose and suspect you're just part of the dead Internet. We are still waiting for all these AI enhanced apps to flood the market.


Exactly, Codex gpt-5-high is quite like sending smart devs. It still makes mistakes, and when it does they're extremely stupid ones, but I am now accepting the code it generates as throwable and I just reroll when it does something dumb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: