Though I think it's only strictly true, if the intervals you sample over are the...

fiddlerwoaroof · 2025-05-09T01:54:56 1746755696

The second formulation sounds easier to use to adapt to specific use cases too: just bump the priority of a message based on your business rules to make it more likely that interesting events get to your log database.

eru · 2025-05-09T03:54:32 1746762872

You could do (category, random priority) and then do lexicographic comparison. That way higher categories always outrank lower categories.

But depending on what you need, you might also just do (random priority + weight * category) or so. Or you just keep separate reservoirs for high importance items and for everything else.

BobaFloutist · 2025-05-09T18:19:55 1746814795

I would expect any way to get a truly fair sample from a truly fair sample would necessarily result in a truly fair sample. I can't imagine how it could possibly not.

eru · 2025-05-10T02:10:39 1746843039

You are dropping a lot of context here.

In the first instance, every second we get a 'truly fair' random sample from all the messages in that second.

Going from there to eg a 'truly fair' random sample from all the messages in a minute is not trivial. And it's not even possible just from the samples, without auxiliary information.

fiddlerwoaroof · 2025-05-10T00:32:34 1746837154

I always find the interaction between probability distributions a little surprising.