More

eatonphil · 2025-08-24T16:44:38 1756053878

Is this sqlite built from source or a distro sqlite? It's possible the defaults differ with build settings.

supriyo-biswas · 2025-08-24T17:23:59 1756056239

The one which avinassh shows is MacOS's SQLite under /usr/bin/sqlite3. In general it also has some other weird settings, like not having concat() method, last I checked.

ncruces · 2025-08-24T21:21:29 1756070489

The Apple built macOS SQLite is something.

Another oddity: misteriously reserving 12 bytes per page for whatever reason, making databases created with it forever incompatible with the checksum VFS.

Other: having 3 different layers of fsync to avoid actually doing any F_FULLFSYNC ever, even when you ask it for a fullfsync (read up on F_BARRIERFSYNC).

zimpenfish · 2025-08-25T09:28:54 1756114134

> it also has some other weird settings

You also can't load extensions with `.load` (presumably security but a pain in the arse.)

    user ~ $ echo | /opt/homebrew/opt/sqlite3/bin/sqlite3 '.load'
    [2025-08-25T09:27:54Z INFO  sqlite_zstd::create_extension] [sqlite-zstd] initialized
    user ~ $ echo | /usr/bin/sqlite3 '.load'
    Error: unknown command or invalid arguments:  "load". Enter ".help" for help

eatonphil · 2025-08-16T21:57:59 1755381479

Anyone can implement Raft. There are plenty of implementations of them not by Google engineers, including a custom one in the product I work on. And developers in the Software Internals Discord are constantly in there asking questions on the road to implementing Raft or Viewstamped Replication.

trenchpilgrim · 2025-08-16T22:01:38 1755381698

I believe the parent is referring to pre-raft consensus algorithms like Paxos. I recall the explanation of Paxos being a lengthy PDF while the explanation of Raft is a single webpage, mostly visual.

eatonphil · 2025-08-16T22:16:23 1755382583

Could be, it was a little ambiguously worded. That said, single-decree Paxos is much simpler than Raft but I agree The Part-Time Parliament's analogy is a pain to read. But it's better if you just ignore the beginning chunk of the paper and read like the appendix; A1 The Basic Protocol being simpler to understand.

mananaysiempre · 2025-08-16T22:22:40 1755382960

There’s also the side-by-side Paxos/Raft comparison in Howard & Mortier’s “Consensus on consensus”[1] paper, which is not enough to understand either by itself, but a great help if have a longer explanation you’re going through.

[1] https://dl.acm.org/doi/10.1145/3380787.3393681

mrkeen · 2025-08-17T06:38:17 1755412697

Other way around.

Step 1 of Raft is for the distributed nodes to come to consensus on a fact - i.e. who the leader is.

ALL of Paxos is the distributed nodes coming to consensus on a fact.

Raft just sounds easier because its descriptions use nice-sounding prose and gloss over the details.

eatonphil · 2025-08-16T19:15:01 1755371701

Check out Alex Miller's Data Replication Design Spectrum for what you might use instead of Raft (for replication specifically), or what tweaks you might make to Raft for better throughput or space efficiency (for replication).

https://transactional.blog/blog/2024-data-replication-design...

eatonphil · 2025-08-11T22:16:21 1754950581

Who isn't? Cockroach rewrote Postgres in Go. CedarDB rewrote Postgres in C++.

And then to lesser degrees you've got Yugabyte, AlloyDB, and Aurora DSQL (and certainly more I'm forgetting) that only replace parts of Postgres.

vladich · 2025-08-12T19:28:04 1755026884

Both Cockroach and CedarDB didn't rewrite anything, they built stuff from scratch. Just used the same client protocol. There are a bunch of other unrelated databases using Postgres protocol btw.

eatonphil · 2025-08-12T19:30:33 1755027033

I'm not talking about speaking the protocol. I'm talking about trying as hard as they can to be as indistinguishable from Postgres (to a non-operations user) as they can. And that list is very small.

eatonphil · 2025-08-11T21:35:43 1754948143

To the contrary, the Delta Lake paper is extremely easy to read and implement the basics of (I did) and Iceberg has nothing so concise and clear.

twoodfin · 2025-08-11T22:38:50 1754951930

If I implement what’s described in the Delta Lake paper, will I be able to query and update arbitrary Delta Lake tables as populated by Databricks in 2025?

(Would be genuinely excited if the answer is yes.)

eatonphil · 2025-08-11T22:40:24 1754952024

Not sure (probably not). But it's definitely much easier to immediately understand IMO.

twoodfin · 2025-08-11T22:58:17 1754953097

OK, but at least from my perspective, the point of OTF’s is to allow ongoing interoperability between query and update engines.

A “standard” getting semi-monthly updates via random Databricks-affiliated GitHub accounts doesn’t really fit that bill.

Look at something like this:

https://github.com/delta-io/delta/blob/master/PROTOCOL.md#wr...

Ouch.

eatonphil · 2025-08-11T20:20:30 1754943630

OrioleDB is not about sharding, it's about the storage layer.

qaq · 2025-08-11T20:44:12 1754945052

I did not claim OrioleDB is about sharding. It was just an observation that Supabase is contributing to Postgres ecosystem through multiple projects.

selfhosttoday · 2025-08-11T21:09:49 1754946589

they likely said that because the context is "vitess for postgres projects" and OrioleDB is not "vitess for postgres"

eatonphil · 2025-08-08T13:03:11 1754658191

I contributed back a bit more info but you'll only see it in the 18/devel docs.

eatonphil · 2025-08-04T15:43:39 1754322219

One of the areas I wonder about this a lot is when integrating Rust code into Postgres which has its own allocator system. Mostly right now when we need to have complex data structures (non-Postgres data structures) that must live outside of the lexical scope we put them somewhere global and return a handle to the C code to reference the object. But with the upcoming support for passing an allocator to any data structure (in the Rust standard library anyway) I think this gets a lot easier?

tialaramex · 2025-08-04T16:01:39 1754323299

For me the most interesting thing in Allocator is that it's allowed to say OK, you wanted 185 bytes but I only have a 256 byte allocation here, so, here is 256 bytes.

This means that e.g. a growable container type doesn't have to guess that your allocator probably loves powers of 2 and so it should try growing to 256 bytes not 185 bytes, it can ask for 185 bytes, get 256 and then pass that on to the user. Significant performance is left on the table when everybody is guessing and can't pass on what they know due to ABI limitations.

Rust containers such as Vec are already prepared to do this - for example Vec::reserve_exact does not promise you're getting exactly the capacity you asked for, it won't do the exponential growth trick because (unlike Vec::reserve) you've promised you don't want that, but it would be able to take advantage of a larger capacity provided by the allocator.

IshKebab · 2025-08-04T20:52:27 1754340747

There's so much more information that code could give allocators but doesn't due to being stuck with ancient C APIs. Is the allocation likely to be short lived? Is speed or efficiency more important? Is it going to be accessed by multiple threads? Is it likely to grow in future?

duped · 2025-08-04T21:34:38 1754343278

That seems suspect to me. If I call reserve_exact I do actually mean reserve_exact and I want .capacity() to return with the argument I passed to reserve_exact(). This is commonly done when using Vec as a fixed capacity buffer and you don't want to add another field to whatever owns it that's semantically equivalent to .capacity().

I don't really care if the memory region is past capacity * size of::<T>(), but I do want to be able to check if .len() == .capacity() and not be surprised

tialaramex · 2025-08-04T21:55:31 1754344531

> This is commonly done when using Vec as a fixed capacity buffer and you don't want to add another field to whatever owns it that's semantically equivalent to .capacity().

The documentation for Vec already explains exactly what it's offering you, but lets explore, what exactly is the problem? You've said this is "commonly done" so doubtless you can point at examples for reference.

Suppose a Goose is 40 bytes in size, and we aim to store say 4 of them, for some reason we decide to Vec::new() and then Vec::reserve_exact(..., 4) rather than more naturally (but with the same effect) asking Vec::with_capacity(4) but alas the allocator underpinning our system has 128 or 256 byte blocks to give, 4x40 = 160, too big for 128, so a 256 byte block is allocated and (a hypothetical future) Vec sets capacity to 6 anyway.

Now, what disaster awaits in the common code you're talking about? Capacity is 6 and... there's capacity for 6 entries instead of 4

duped · 2025-08-04T23:50:49 1754351449

The condescension isn't appropriate here. I'm talking about using `Vec` as a convenient temporary storage without additional bookkeeping on top if the capacity() is meaningful. Like you said, Rust doesn't guarantee that because `reserve_exact` is not `reserve_exact`. In C++, the pattern is to resize() and shrink_to_fit(), which is implementation defined but when it's defined to do what it says, you can rely on it.

> Now, what disaster awaits in the common code you're talking about? Capacity is 6 and... there's capacity for 6 entries instead of 4

The capacity was expected to be 4 and not 6, which may be a logical error in code that requires it to be. If this wasn't a problem the docs wouldn't call it out as a potential problem.

tialaramex · 2025-08-05T01:16:19 1754356579

The condescension you've detected is because I doubt your main premise - that what you've described is "common" and so the defined behaviour will have a significant negative outcome. It's no surprise to me that you can offer no evidence for that premise whatsoever and instead just retreat to insisting you were correct anyway.

The resize + shrink_to_fit incantation sounds to me a lot like one of those "Sprinkle the volatile keyword until it works" ritualistic C++ practices not based in any facts.

fc417fc802 · 2025-08-05T05:46:29 1754372789

> the pattern is to resize() and shrink_to_fit(),

As someone who primarily writes C++ I would not expect that to work. I mean it's great if it does I guess (I don't really see the point?) but that would honestly surprise me.

I would _always_ expect to use >= for capacity comparisons and I don't understand what the downside would be. The entire point of these data structures is that they manage the memory for you. If you need precise control over memory layout then these are the wrong tools for the job.

Arnavion · 2025-08-04T15:49:37 1754322577

>But with the upcoming support for passing an allocator to any data structure (in the Rust standard library anyway) I think this gets a lot easier?

Yes and no. Even within libstd, some things require A=GlobalAlloc, eg `std::io::Read::read_to_end(&mut Vec<u8>)` will only accept Vec<u8, GlobalAlloc>. It cannot be changed to work with Vec<u8, A> because that change would make it not dyn-compatible (nee "object-safe").

And as you said it will cut you off from much of the third-party crates ecosystem that also assumes A=GlobalAlloc.

But if the subset of libstd you need supports A=!GlobalAlloc then yes it's helpful.

RGBCube · 2025-08-05T01:00:41 1754355641

If the `A` generic parameters were changed to be ?Sized, it would still be possible to make `read_to_end` support custom allocators by changing the signature to `read_to_end(&mut dyn Vec<u8, Allocator>)`

Not sure if that is a breaking change though, it probably is because of a small detail, I'm not a rustc dev.

Arnavion · 2025-08-05T01:14:25 1754356465

First of all, `dyn Vec` is impossible. Vec is a concrete type, not a trait. I assume you meant `Vec<u8, dyn Allocator>`.

Second, no a `&mut Vec<u8, A>` is not convertible to `&mut Vec<u8, dyn Allocator>`. This kind of unsized coercion cannot work because it'll require a whole different Vec to be constructed, one which has an `allocator: dyn Allocator` field (which is unsized, and thus makes the Vec unsized) instead of an `allocator: A` field. The unsized coercion you're thinking of is for converting trait object references to unsized trait object references; here we're talking about a field behind a reference.

RGBCube · 2025-08-05T18:19:56 1754417996

Sorry, I meant `&Vec<T, dyn Allocator>`.

And no, it is possible. Here is an example that does it with BufReader, which has T: ?Sized and uses it as a field: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Though it comes with a caveat that you can't take self by value, which is perfectly fine for this use case & is what a normal allocator-aware language does anyway.

Arnavion · 2025-08-10T21:08:38 1754860118

I stand corrected. I didn't know rustc supported such a coercion automatically. Now I see it is documented in CoerceUnsized + Unsize.

That said, other than the problem of this being a breaking API change for Read::read_to_end, another problem is that Vec's layout is { RawVec, len } and the allocator is inside RawVec, so the allocator is not the last field of Vec, which is required for structs to contain unsized fields. It would require reordering Vec's fields to { len, RawVec } which may not be something libstd wants to do (since it'll make data ptr access have an offset from Vec ptr), or to inline RawVec into Vec as { ptr, cap, len, allocator }.

steveklabnik · 2025-08-04T15:45:56 1754322356

I’m not sure what those two things have to do with each other, though I did just wake up. The only thing the new allocator stuff would give you is the ability to allocate a standard library data structure with the Postgres allocator. Scoping and handles and such wouldn’t change, and using your own data structures wouldn’t change.

It’s also very possible I’m missing something!

eatonphil · 2025-08-04T15:48:03 1754322483

> The only thing the new allocator stuff would give you is the ability to allocate a standard library data structure with the Postgres allocator.

Yeah no this is basically all I'm saying. I'm excited for this.

steveklabnik · 2025-08-04T16:45:35 1754325935

Ah yeah, well it's gonna be a good feature for sure when it ships!

eatonphil · 2025-08-02T21:02:30 1754168550

I included the ISBN on the page. :) 9780124159501

Yes this is only for this book's discussion. The broader mailing list is on /bookclub.html. And that mailing list is used just to stay in the loop about future readings (and votes on future readings).

eatonphil · 2025-08-02T18:07:15 1754158035

The very first one I did was in person in NYC. Of the 20 who signed up 5-7 actively showed up. I decided to move it purely asynchronous online to make it easier for anyone anywhere to participate. I host other meetups in NYC still just not a tech book club.