More

Quarrel · 2025-11-16T06:11:41 1763273501

I wonder if this is why NextDNS block the archive.today domains?

While the NextDNS company is registered in Delaware, the founders are French nationals, so may feel more exposed to such threats.

fwiw, you can use rewrites for these domains in the nextdns settings, or manage it in your local dns client, and get around this pretty easily.

Quarrel · 2025-11-13T11:08:52 1763032132

My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

Most of the time, I suspect, people are using it like wikipedia, but with a shortcut to cut through to the real question they want answered; and unfortunately they don't know if it is right or wrong, they just want to be told how bright they were for asking it, and here is the answer.

OpenAI then get caught in a revenue maximising hell-hole of garbage.

God, I hope I am wrong.

xmcqdpt2 · 2025-11-13T13:07:02 1763039222

LLMs only really make sense for tasks where verifying the solution (which you have to do!) is significantly easier than solving the problem: translation where you know the target and source languages, agentic coding with automated tests, some forms of drafting or copy editing, etc.

General search is not one of those! Sure, the machine can give you its sources but it won't tell you about sources it ignored. And verifying the sources requires reading them, so you don't save any time.

embedding-shape · 2025-11-13T13:16:41 1763039801

I agree a lot with the first part, the only time I actually feel productive with them is when I can have a short feedback cycle with 100% proof if it's correct or not, as soon as "manual human verification" is needed, things spiral out of control quickly.

> Sure, the machine can give you its sources but it won't tell you about sources it ignored.

You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

igravious · 2025-11-13T13:46:56 1763041616

> You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

Feel like this should be built in?

Explain your setup in more detail please?

embedding-shape · 2025-11-13T14:20:11 1763043611

> Feel like this should be built in?

Not everyone uses LLMs the same way, which is made extra clear because of the announcement this submission is about. I don't want conversational LLMs, but seems that perspective isn't shared by absolutely everyone, and that makes sense, it's a subjective thing how you like to be talked/written to.

> Explain your setup in more detail please?

I don't know what else to tell you that I haven't said already :P Not trying to be obtuse, just don't know what sort of details you're looking for. I guess in more specific terms; I'm using llama.cpp(/llama-server) as the "runner", and then I have a Rust program that acts as the CLI for my "queries", and it makes HTTP requests to llama-server. The requests to llama-server includes "tools", where one of those is a "web_search" tool hooked up to a local YaCy instance, another is "verify_claim" which basically restarts a new separate conversation inside the same process, with access to a subset of the tools. Is that helpful at all?

AJ007 · 2025-11-13T15:48:30 1763048910

"one call per claim" I wonder how long it takes for it to be common knowledge how important this is. Starting to think never. Great idea by the way, I should try this.

embedding-shape · 2025-11-13T15:55:54 1763049354

I've been trying to figure out ways of highlighting why it's important and how it actually works, maybe some heatmap of the attention of previous tokens, so people can see visually how messed up things become once even two concepts at the same time are mixed.

btown · 2025-11-13T15:34:40 1763048080

One of the dangers of automated tests is that if you use an LLM to generate tests, it can easily start testing implemented rather than desired behavior. Tell it to loop until tests pass, and it will do exactly that if unsupervised.

And you can’t even treat implementation as a black box, even using different LLMs, when all the frontier models are trained to have similar biases towards confidence and obsequiousness in making assumptions about the spec!

Verifying the solution in agentic coding is not nearly as easy as it sounds.

xmcqdpt2 · 2025-11-13T23:04:24 1763075064

Not only can it easily do this, I've found that Claude models do this as a matter of course. My strategy now has been to either write the test or write the implementation and use Claude for the other one. That keeps it a lot more honest.

Zr01 · 2025-11-13T15:41:28 1763048488

I've often found it helpful in search. Specifically, when the topic is well-documented, you can provide a clear description, but you're lacking the right words or terminology. Then it can help in finding the right question to ask, if not answering it. Recall when we used to laugh at people typing in literal questions into the Google search bar? Those are the exact types of queries that the LLM is equipped to answer. As for the "improvements" in GPT 5.1, seems to me like another case of pushing Clippy on people who want Anton. https://www.latent.space/p/clippy-v-anton

msabalau · 2025-11-13T15:13:50 1763046830

That's a major use case, especially if the definition is broad enough to include take my expertise, knowledge and perhaps a written document, and transmute it to others forms--slides, illustrations, flash cards, quizzes, podcasts, scripts for an inbound call center.

But there seem to be uses where a verified solution is irrelevant. Creativity generally--an image, poem, description of an NPC in a roleplaying game, the visuals for a music video never have to be "true", just evocative. I suppose persuasive rhetoric doesn't have to be true, just plausible or engaging.

As for general search, I don't know that we can say that "classic search" can be meaningful said to tell you about the sources it ignored. I will agree that using OpenAI or Perplexity for search is kind of meh, but Google's AI Mode does a reasonable job at informing you about the links it provides, and you can easily tab over to a classic search if you want. It's almost like having a depth of expertise doing search helps in building a search product the incorporates an LLM...

But, yeah, if one is really disinterested in looking at sources, just chatting with a typical LLM seems a rather dubious way to get an accurate or reasonable comprehensive answer.

kenjackson · 2025-11-13T13:14:25 1763039665

Don’t search engines have the same problem? You don’t get back a list of sites that the engine didn’t prefer for some reason.

skywhopper · 2025-11-13T13:39:19 1763041159

With search engine results you can easily see and judge the quality of the sources. With LLMs, even if they link to sources, you can’t be sure they are accurately representing the content. And once your own mind has been primed with the incorrect summary, it’s harder to pull reality out of the sources, even if they’re good (or even relevant — I find LLMs often pick bad/invalid sources to build the summary result).

xmcqdpt2 · 2025-11-13T23:08:13 1763075293

Exactly. I've gotten much more interested by LLM now that i've accepted I can just look at the final result (code) without having to read any of the justification wall of text, which is generally convincing bullshit.

It's like working with a very cheap, extremely fast, dishonest and lazy employee. You can still get them to help you but you have to check them all the time.

kace91 · 2025-11-13T11:40:05 1763034005

I’m of two minds about this.

The ass licking is dangerous to our already too tight information bubbles, that part is clear. But that aside, I think I prefer a conversational/buddylike interaction to an encyclopedic tone.

Intuitively I think it is easier to make the connection that this random buddy might be wrong, rather than thinking the encyclopedia is wrong. Casualness might serve to reduce the tendency to think of the output as actual truth.

gizajob · 2025-11-13T14:28:35 1763044115

Sam Altman probably can’t handle any GPT models that don’t ass lick to an extreme degree so they likely get nerfed before they reach the public.

chud37 · 2025-11-13T13:47:56 1763041676

Its very frustating that it can't be relied upon. I was asking gemini this morning about Uncharted 1,2 and 3 if they had a remastered version for the PS5. It said no. Then 5 minutes later I on the PSN store there were the three remastered versions for sale.

underlipton · 2025-11-13T18:33:11 1763058791

People have been using, "It's what the [insert Blazing Saddles clip here] want!" for years to describe platform changes that dumb down features and make it harder to use tools productively. As always, it's a lie; the real reason is, "The new way makes us more money," usually by way of a dark pattern.

Stop giving them the benefit of the doubt. Be overly suspicious and let them walk you back to trust (that's their job).

ceejayoz · 2025-11-13T16:31:34 1763051494

> My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

That tracks; it's what's expected of human customer service, too. Call a large company for support and you'll get the same sort of tone.

intended · 2025-11-13T11:12:54 1763032374

We know they are using it like search - there’s a jigsaw paper around this.

Wololooo · 2025-11-13T12:15:38 1763036138

Again, if they had anything worth in the pipeline, Sora wouldn't have been a thing...

jollyllama · 2025-11-13T16:00:31 1763049631

While I wouldn't strain the analogy, a wolfdog is more capable but people love lapdogs.

Quarrel · 2025-11-10T10:05:10 1762769110

It certainly seems to me that using this would eliminate 75% or so of the objections to it.

For this use case, at least, it feels like a CS version of racism. MSFT is bad, so no MSFT.

It largely clears up an idiosyncrasy from the evolution of C.

(but, as someone that briefly worked on plan9 in 1995/96, I like your idea :)

wahern · 2025-11-10T11:32:54 1762774374

Can you confirm whether or not anonymous member structures originated with the Plan 9 C compiler? I know I first learned of them from the Plan 9 compiler documentation, but that was long after they were already in GCC. I can't find when they were added to Microsoft's C compiler, but I'm guessing GCC's "-fms-extensions" flag is so named simply because it originated as a compatibility option for the MinGW project, and doesn't by itself imply they were a Microsoft invention. GCC gained -fms-extensions and anonymous member structures in 1999, and MinGW is first mentioned in GCC in 1997. (Which maybe suggests Microsoft C gained anonymous structure members between 1997 and 1999?)

Relatedly, do you know if anonymous member unions originate with C++, Plan 9 C, or elsewhere?

dwattttt · 2025-11-10T11:52:09 1762775529

Archives of published MS SDKs show they were using the feature in NT 3.1's public headers in 1993, so it's at least that old.

https://archive.org/details/win32-sdk-final-release-nt-31

tleb_ · 2025-11-10T10:29:02 1762770542

Do you have references to objections? I couldn't find any on the lkml threads.

Quarrel · 2025-11-07T05:50:47 1762494647

Just a guess, but are you in the UK and is /r/osint flagged NSFW, harmful or mature?

If so, they want to age-verify you.

Quarrel · 2025-10-26T08:42:38 1761468158

I like the "Hopefully, just a limited edition." line too :)

Quarrel · 2025-10-21T09:00:35 1761037235

I've never used ferron, but if you look at the graphs, he gives comparisons.

So, I guess, performance + easy of use. Obviously, caddy is much more mature though.

Quarrel · 2025-10-17T01:41:33 1760665293

re: 3) & medical related prompts

At gemini.google.com you can provide context & instructuions (Settings->Personal Context). I provide a few bits of guidance to help manage its style, but I haven't been getting much pushback on medical advice since adding this one:

" Please don't give me warnings about the information you're providing not being legal advice, or medical advice, or telling me to always consult a professional, when I ask about issues. Don't be sycophantic. "

YMMV.

Quarrel · 2025-10-15T13:46:25 1760535985

Indeed.

They get to the bottom of the post and drop:

> Fargate handles scaling for us without the serverless constraints

They dropped workers for containers.

Quarrel · 2025-10-08T11:47:45 1759924065

Ok, I'll bite.

Why?

Isn't it still "just" a powerful enough computer?

ReptileMan · 2025-10-08T17:29:05 1759944545

Hydrogen is small enough that uncertainty principle is not completely irrelevant.

Quarrel · 2025-10-07T13:21:29 1759843289

FWIW, rental rates are very similar between the UK & the USA.

For central London, where I've lived for most of the last decade, yeah, there aren't a lot of parking spots per household, but that's also going to be true in NYC or any other built up older city. As for newer developments, I suspect they get more parking than older ones. My Victorian block has none.

As always, London is not the UK. Outside London it seems good in well off areas for EVs, and, of course, bad in the neglected rest of the UK.