More

Flux159 · 2025-08-22T20:56:30 1755896190

How does this compare to s6? I recently used it to setup an init system in docker containers & was wondering if nitro would be a good alternative (there's a lot of files I had to setup via s6-overlay that wasn't as intuitive as I would've hoped).

nine_k · 2025-08-22T21:07:44 1755896864

S6 is way more complex and rich. Nitro or runit would be simpler alternatives; maybe even https://github.com/krallin/tini.

Flux159 · 2025-08-23T00:51:13 1755910273

Thanks! Reading some of your other comments, it seems like runit or nitro may not have been a good choice for my usecase? (I'm using dependencies between services so there is a specific order enforced & also logging for 3 different services as well).

You seem to know quite a bit about init systems - for containers in particular do you have some heuristics on which init system would work best for specific usecases?

ItsHarper · 2025-08-23T01:10:01 1755911401

dinit is another one with service dependency support

Flux159 · 2025-08-12T20:25:12 1755030312

Getting a CORS error from the API - is the demo at https://search.wilsonl.in/ working for anyone else?

bstsb · 2025-08-12T20:27:38 1755030458

cors error is due to the actual request failing (502 Bad Gateway). hug of death?

Flux159 · 2025-08-12T20:29:28 1755030568

Yeah just saw the 502 - probably hug of death.

Flux159 · 2025-08-06T18:55:30 1754506530

https://archive.ph/I2Yl9

Flux159 · 2025-08-02T00:47:19 1754095639

Tried this out with Cline using my own API key (Cerebras is also available as a provider for Qwen3 Coder via via openrouter here: https://openrouter.ai/qwen/qwen3-coder) and realized that without caching, this becomes very expensive very quickly. Specifically, after each new tool call, you're sending the entire previous message history as input tokens - which are priced at $2/1M via the API just like output tokens.

The quality is also not quite what Claude Code gave me, but the speed is definitely way faster. If Cerebras supported caching & reduced token pricing for using the cache I think I would run this more, but right now it's too expensive per agent run.

sysmax · 2025-08-02T02:44:31 1754102671

Adding entire files into the context window and letting the AI sift through it is a very wasteful approach.

It was adopted because trying to generate diffs with AI opens a whole new can of worms, but there's a very efficient approach in between: slice the files on the symbol level.

So if the AI only needs the declaration of foo() and the definition of bar(), the entire file can be collapsed like this:

  class MyClass {
    void foo();
    
    void bar() {
        //code
    }
  }

Any AI-suggested changes are then easy to merge back (renamings are the only notable exception), so it works really fast.

I am currently working on an editor that combines this approach with the ability to step back-and-forth between the edits, and it works really well. I absolutely love the Cerebras platform (they have a free tier directly and pay-as-you-go offering via OpenRouter). It can get very annoying refactorings done in one or two seconds based on single-sentence prompts, and it usually costs about half a cent per refactoring in tokens. Also great for things like applying known algorithms to spread out data structures, where including all files would kill the context window, but pulling individual types works just fine with a fraction of tokens.

If you don't mind the shameless plug, there's a more explanation how it works here: https://sysprogs.com/CodeVROOM/documentation/concepts/symbol...

postalcoder · 2025-08-02T03:11:25 1754104285

this works if your code is exceptionally well composed. anything less can lead to looney tunes levels of goofiness in behavior, especially if there’s as little as one or two lines of crucial context elsewhere in the file.

This approach saves tokens theoretically, but i find it can lead to wastefulness as it tries to figure out why things aren’t working when loading the full file would have solved the problem in a single step.

sysmax · 2025-08-02T03:34:41 1754105681

It greatly depends on the type of work you are trying to delegate to the AI. If you ask it to add one entire feature at a time, file level could work better. But the time and costs go up very fast, and it's harder to review.

What works for me (adding features to huge interconnected projects), is think what classes, algorithms and interfaces I want to add, and then give very brief prompts like "split class into abstract base + child like this" and "add another child supporting x,y and z".

So, I still make all the key decisions myself, but I get to skip typing the most annoying and repetitive parts. Also, the code don't look much different from what I could have written by hand, just gets done about 5x faster.

DrBenCarson · 2025-08-02T03:41:46 1754106106

Yep and it collapses in the enterprise. The code you’re referencing might well be from some niche vendor’s bloated library with multiple incoherent abstractions, etc. Context is necessarily big

sysmax · 2025-08-02T04:31:03 1754109063

Ironically, that's how I got the whole idea of symbol-level edits. I was working on project like that, and realized that a lot of work is actually fairly small edits. But to do one right, you need to you need to look through a bunch of classes, abstraction layers, and similar implementations, and then keep in your head how to get an instance of X from a pointer to Y, etc. Very annoying repetitive work.

I tried copy-pasting all the relevant parts into ChatGPT and gave it instructions like "add support for X to Y, similar to Z", and it got it pretty well each time. The bottleneck was really pasting things into the context window, and merging the changes back. So, I made a GUI that automated it - showed links on top of functions/classes to quickly attach them into the context window, either as just declarations, or as editable chunks.

That worked faster, but navigating to definitions and manually clicking on top of them still looked like an unnecessary step. But if you asked the model "hey, don't follow these instructions yet, just tell me which symbols you need to complete them", it would give reasonable machine-readable results. And then it's easy to look them up on the symbol level, and do the actual edit with them.

It doesn't do magic, but takes most of the effort out of getting the first draft of the edit, than you can then verify, tweak, and step through in a debugger.

hooo · 2025-08-02T05:14:47 1754111687

Totally agree with your view on the symbolic context injection. Is this how things are done with code/dev AI right now? Like if you consider the state of the art.

seunosewa · 2025-08-02T09:46:53 1754128013

They search for the token of interest, e.g. grep -n then they read that line and the next 50 lines or so. They continue until they get to the end.

seunosewa · 2025-08-02T09:58:19 1754128699

The Cerebras.ai plan offers a flat fee of $50 or $200.

The API price is not a reason to reject the subscription price.

dedene · 2025-08-02T15:08:42 1754147322

The flat fee is for a fixed max amount of tokens per day. Not requests, tokens.

Havoc · 2025-08-02T01:20:47 1754097647

This seems to be rate limited by message not token so the lack of cache may matter less

andhuman · 2025-08-02T06:48:10 1754117290

No it’s by token. The FAQ says this:

> Actual number of messages per day depends on token usage per request. Estimates based on average requests of ~8k tokens each for a median user.

https://cerebras-inference.help.usepylon.com/articles/346886...

jtbayly · 2025-08-02T17:01:26 1754154086

How did you find that? Are you sure it applies to Cerebras Code Pro or Max?

NitpickLawyer · 2025-08-02T05:38:24 1754113104

Yes, but the new "thing" now is "agentic" where the driver is "tool use". So at every point where the LLM decides to make a tool use, there is a new request that gets sent. So a simple task where the model needs to edit one function down the tree, there might be 10 calls - 1st with the task, 2-5 for "read_file", then the model starts writing code, 6-7 trying to run the code, 8 fixing something, and so on...

itsafarqueue · 2025-08-02T06:31:39 1754116299

Yup. If you’ve ever watched a 60+ minute agent loop spawning sub agents, your “one message” prompt leaves you several hundred messages in the hole.

Flux159 · 2025-08-02T01:51:08 1754099468

The lack of caching causes the price to increase for each message or tool call in a chat because you need to send the entire history back after every tool call. Because there isn’t any discount for cached tokens you’re looking at very expensive chat threads.

waldrews · 2025-08-02T22:50:58 1754175058

Does caching make as much sense as a cost saving measure on Cerebras hardware as it does on mainstream GPU's? Caching should be preferred if SSD->VRAM is dramatically cheaper than recalculation. If Cerebras is optimized for massively parallel compute with fixed weights, and not a lot of memory bandwidth into or out of the big wafer, it might actually make sense to price per token without a caching discount. Could someone from the company (or otherwise familiar with it) comment on the tradeoff?

BenGosub · 2025-08-02T07:49:08 1754120948

If they say it costs $50 per month, why do you need to make additional payments?

davidweatherall · 2025-08-02T08:09:50 1754122190

$50 per month is their SaaS solution that let's you make 1000 requests per day. The openrouter cost is the raw API cost if you try to use qwen3-coder via the pay as you go model when using Cline

beastman82 · 2025-08-02T14:10:06 1754143806

the API price is not very relevant to this flat fee service announcement.

In fact it seems obvious that you should use the flat fee model instead

Flux159 · 2025-07-22T16:02:51 1753200171

So this seems to be a build definition and some form of attestation system? Does this require builds are done via CI systems instead of on adhoc developer machines?

I find that for many npm packages, I don't know how builds were actually published to the registry and for some projects that I rebuilt myself in docker, I got vastly different sizes of distribution artifacts.

Also, it seems like this is targeting pypi, npm, and crates at first - what about packages in linux distro repositories (debian, etc.)?

msuozzo · 2025-07-22T18:04:37 1753207477

Author here!

> Does this require builds are done via CI

Nope! One use for OSS Rebuild would be providing maintainers that have idiosyncratic release processes with an option for providing strong build integrity assurances to their downstream users. This wouldn't force them into any particular workflow, just require their process be reproducible in a container.

> for some projects that I rebuilt myself in docker, I got vastly different sizes of distribution artifacts.

Absolutely. OSS Rebuild can serve to identify cases where there may be discrepancies (e.g. accidentally included test or development files) and publicize that information so end-users can confidently understand, reproduce, and customize their dependencies.

> what about packages in linux distro repositories (debian, etc.)

OSS Rebuild actually does have experimental support for Debian rebuilds, not to mention work towards JVM and Ruby support, although no attestations have been published yet. There is also no practical impediment to supporting additional ecosystems. The existing support is more reflective of the size of the current team rather than the scope of the project.

candiddevmike · 2025-07-22T16:08:32 1753200512

The industry has been coalescing around third-party attestation for open source packages since COVID. The repercussions of this will be interesting to watch, but I don't see any benefits (monetary or otherwise) for the poor maintainers dealing with them.

There's probably a lot of people that see GenAI as the solution to Not Invented Here: just have it rewrite your third party dependencies! What could go wrong. There will also be some irony of this situation with third party dependencies being more audited/reviewed than the internal code they get integrated into.

Y_Y · 2025-07-22T17:06:58 1753204018

I don't mind if the "third parties" are other trusted developers of the same project, for example. But please let's not centralise it. We're just going to get Robespierre again.

jpalomaki · 2025-07-22T16:19:02 1753201142

Two fold: AI makes it easier to find "issues" in existing software and automate the CVE process. This means more "critical" vulnerabilities that require attention from developers using these packages.

At the same time rolling your own implementation with GenAI will be quick and easy. Outsiders are checking these, so no CVEs for these. Just sit back and relax.

kpcyrd · 2025-07-23T09:38:55 1753263535

For packages in Linux distributions you probably want one of:

- https://reproducible.archlinux.org/ (since 2020)

- https://reproduce.debian.net/ (since 2024)

There's arch-repro-status and debian-repro-status respectively to show the status of the packages you have installed, but since it's not yet possible to make an opensource computer out of reproducible-only software, there isn't really any tooling to enforces this through policy.

Flux159 · 2025-07-21T01:53:10 1753062790

From the article it mentions that they use a single chat thread but randomly choose between 2 different models (w/ best results from Gemini 2.5 / Sonnet 4.0 right now).

Are there any library helpers for managing this with tool call support or is it just closed source / dependent on someone else to make open source inside a different library?

OtherShrezzing · 2025-07-21T08:54:39 1753088079

You can achieve this with LMStudio's UI to test it today. You can switch between different local models in the same chat context. You can also edit previous chat results to remove context-poisoning information.

LMStudio has an API, so it should be possible to hook into that with relatively little code.

tptacek · 2025-07-21T02:20:20 1753064420

It should be pretty simple to do, right? It shouldn't be that hard to abstract out tool calls.

rockwotj · 2025-07-21T02:39:20 1753065560

I did this in about 400 or 500 lines of typescript with direct API calls into vertex AI (using a library for auth still). Supports zod for structured outputs (gemini 2.5 supports json schema proper, not just the openapi schemas the previous models did), and optionally providing tools or not. Includes a nice agent loop that integrates well with it and your tools get auto deserialized and strongly typed args (type inference in ts these days is so good). Probably could had been less if I had used googles genai lib and anthropic’s sdk - I didn’t use them because it really wasn’t much code and I wanted to inject auditing at the lowest level and know the library wasn’t changing anything.

If you really want a library, python has litellm, and typescript has vercel’s AI library. I am sure there are many others, and in other languages too.

refulgentis · 2025-07-21T02:22:31 1753064551

Its a godforsaken nightmare.

There's a lotta potemkin villages, particularly in Google land. Gemini needed highly specific handholding. It's mostly cleared up now.

In all seriousness, more or less miraculously, the final Gemini stable release went from like 20%-30% success at JSON edits to 80%-90%, so you could stop doing the parsing Aider edits out of prose.

fizx · 2025-07-21T02:36:45 1753065405

Annoying, yes. Tractable, absolutely!

thorum · 2025-07-21T05:01:47 1753074107

I recommend litellm if you’re writing Python code, since it handles provider differences for you through a common interface:

https://docs.litellm.ai/

Flux159 · 2025-07-18T19:43:47 1752867827

A bloomberg archive.ph article about the same topic - https://archive.ph/9qB0t

Flux159 · 2025-07-15T15:53:44 1752594824

I'm not sure I would say this is solved - having used CNPG with GKE, I ended up moving off of hosting Postgres inside of Kubernetes & moving to Cloud SQL and Spanner for storing state in my apps.

Few issues I found was that having stateful sets caused GKE to be unable to automatically update in my cluster setup. I had a read-write and read replica for a few postgres databases & the operator wasn't consistent with what pod id was read-writeable (you were able to retrieve this info from metadata though, so could script around it). Then having to setup backups to GCS manually & needing to ensure that they would work correctly to restore - easier to just pay GCP to deal with all of this with Cloud SQL.

Flux159 · 2025-07-10T00:03:14 1752105794

This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.

Is the extension itself open source? Or only the extension-tools?

In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?

miguelspizza · 2025-07-10T00:09:38 1752106178

The extension should be open source. I had it as a private submodule until today. Let me figure out my it's not showing up and get back to you.

The extension itself is a MCP server which can be connected to by other extension over cross extension messaging. Since the extension is part of the protocol, I'd like for the community to pull from the same important parts of the extension (MCPHub, content script) so they are consistent across extension implementations.

miguelspizza · 2025-07-10T00:24:16 1752107056

Ok it's open source now

Flux159 · 2025-07-10T00:39:58 1752107998

Thanks! Took a very quick look. It seems like the extension exposes tools for all domains that support mcp-b looking at DomainToolManager - does this mean if I have two tabs for a single domain you'll have duplicate tools per tab?

Haven't had enough time to look through all the code there - interesting problem I guess since a single domain could have multiple accounts connected (ex: gmail w/ account 0 vs account 1 in different tabs) or just a single account (ex: HN).

miguelspizza · 2025-07-10T00:50:55 1752108655

No there is built in tool de-duping. I'm not sure how to handle domains with different url states though.

Like you said there are some edge cases where two tabs of the same website expose different tool sets or have tools of the same name but would result in different outcomes when called.

Curios if you have any thoughts on how to handle this

t1amat · 2025-07-10T01:48:04 1752112084

The user should be able to enable/disable tools or an entire tab’s toolset. Some keep open hundreds of tabs and that’s simply too many potential tools to expose. Deduping doesn’t make sense for the reasons you say, and that one logical task could lead to a series of operations missequenced across a range of tabs.

Flux159 · 2025-06-16T23:03:15 1750114995

This is pretty interesting - how has your experience been with R3F? I've built a small game level with it before and I'm wondering if I want to go all the way and build something larger.

Also, are you planning on making this open source at some point? Fabric is nice in that it manages 2D canvas objects for you & you can build things like an editor on top, in this case as a library what would you consider the primitives on top of Three objects are? Could it be used to make an editor for a 3d level?

DecoySalamander · 2025-06-17T11:40:40 1750160440

It's great for organizing your scenes, and at least for me, it's far more readable than raw Three.js code. Unfortunately, games, especially real-time ones, don't map well to idiomatic React code. You might find yourself putting bits of game logic into various awkward places. Still, I don't think there's a better alternative for the web, except maybe Threlte if you're fine with a smaller ecosystem.

ges · 2025-06-16T23:23:44 1750116224

I like the fact it allows you to write clean declarative code. Imperative becomes quickly messy imo.

I’d need to find a good api to make it open source. It’s a mini ecs system handling the object states atm.

I think the main advantage over vanilla threejs would be built-in user-friendly controls and opinionated common object types (think remote glb models, plane images, etc)