Multi-Threading and Mutation

hedora · on March 13, 2024

Rust’s borrow checker does a nice job of formalizing the ideas in this post with & and &mut (shared reference and exclusive/mutable reference, respectively). The unsafe keyword (and the interior mutability pattern) provides exactly the right escape hatch for things like TableInsert and the join operation in the post.

People like to talk about its memory safety, but it’s the fact that the compiler checks for data-race freedom that really sells rust for me.

Languages like Java claim to be memory safe and type safe, but what does that actually mean if a concurrency bug means you have multiple CPUs stomping over your objects’ internal state in race?

Verdex · on March 13, 2024

To me, I don't even need a multi-threaded system to want Rust's ownership system.

Too many defects that I've encountered in the past have been centered around the detective work that goes into whether or not you happen to want spooky action at a distance.

Ah, in this instance we've created a list with objects in it and someone somewhere else is modifying it. This is causing the defect.

But in this instance we've created a list with objects in it and then someone is creating new deep copies of those objects so that someone somewhere else can no longer modify it. And THAT is causing the defect.

Rust makes this process significantly more explicit (although, I suppose someone could go a little crazy with RefMut and/or &mut and you end up back in the same place where you started with other lang).

The fact that this also makes threading much easier is very much welcome to me.

johnnyjeans · on March 13, 2024

It's a nice idea, but I find the conceptualization/implementation of it to be rather inelegant. It feels like it overcomplicates and goes a long distance to achieve the same effect as an affine type system. It makes reading and writing and conceptualizing about the code that much more annoying. I don't want to be forced into thinking about imperative code in terms of objects and ownerships, that's way too opinionated for my tastes.

I find it much comparable to how modern C++ template metaprogramming goes to great lengths to achieve the same capability as a dependent type system, but again it's somehow even more annoying and grating to read and write than the most proof-laden example of something like Idris.

I don't want to use a language that feels like an amateurish hackjob. Just give me something approximate to ocaml syntax with low level memory control and an affine type system. Please. That's what I want Rust to be. Not a frankenstein's monster that seems in direct competition with C++ to be the most needlessly complicated and ugly language possible.

i_am_a_peasant · on March 13, 2024

You probably want Zig, but it'll be a while until the 1.0 release, maybe 2027-2028

Verdex · on March 13, 2024

> It's a nice idea, but I find the conceptualization/implementation of it to be rather inelegant.

Everyone is welcome to their own opinion and sense of style. Our world is better off for the one that you hold.

> Just give me something approximate to ocaml syntax with low level memory control and an affine type system.

Eh, the beggar cannot also be a chooser. Your desire may be good, but no one is compelled to make it a reality. Our environment experiences a glut of frontend and backend libraries for building your own programming language. As well as guides aplenty to show you the way.

johnnyjeans · on March 13, 2024

Oh sure, I don't mean to imply a helplessness to have such a language. I only provide context to say what I wish Rust was, in order give an abstract picture of the scope of problems I have with the language.

gpderetta · on March 13, 2024

Java gives enough concurrency guarantees in the face of data races to maintain the integrity of the runtime and memory safety.

pornel · on March 13, 2024

But not integrity of the program itself. You can still have data loss and garbage results.

I really like the borrow checker and Rust's Send/Sync thread-safety markers, because they nudge towards cleaner thread-safe architecture, and give compile-time guarantees that work even across complex code, all kinds of collections, and 3rd party dependencies.

riku_iki · on March 13, 2024

> But not integrity of the program itself. You can still have data loss and garbage results.

curious what do you mean here? Java has many utils for concurrent programming, e.g. atomic variables, synchronize blocks etc.

pornel · on March 14, 2024

Java only promises to protect integrity of JVM itself, not your data. If you accidentally write to a non-concurrent collection from multiple threads, it's not a program-corrupting Undefined Behavior like in C/C++/golang, but it can still misbehave by corrupting the collection itself, causing it to mangle or lose data, or in the best case throw an exception. Either way you don't get the intended program behavior, and you may be unaware of using a wrong collection type until you create an invalid access at compile time.

This is different from Rust, where thread-safety is part of the type system, so multi-threaded code that could write to a shared HashMap instead of a ConcurrentHashMap just won't compile with a type error.

nradov · on March 14, 2024

This is quite trivial to avoid in Java. There are even static analysis tools which automatically detect most such defects.

dwattttt · on March 13, 2024

The distinction they might be going for is the difference between guarantees and helpful utilities; good utilities can provide guarantees, but there's a lot to be said for default-safe and a safety culture

munksbeer · on March 14, 2024

> curious what do you mean here? Java has many utils for concurrent programming

Those are necessary but not sufficient to protect your program from data races or other multi-threaded issues.

It is very easy to get them incorrect.

jjtheblunt · on March 13, 2024

Java doesn’t claim quite to be type safe considering type erasure is built into its spec, so that, by runtime, containers (for example) might be surprised by a runtime appearance of an object type not checked for or known at compile time.

lmm · on March 13, 2024

> Java doesn’t claim quite to be type safe considering type erasure is built into its spec, so that, by runtime, containers (for example) might be surprised by a runtime appearance of an object type not checked for or known at compile time.

Erasure has nothing to do with type safety; ultimately almost all types are "erased at runtime" since we have Von Neumann architecture where effectively everything is a byte/word, but this does not mean mechanical computers cannot run typesafe languages. Java reflection is arguably less safe due to type erasure, but reflection is almost inherently type-unsafe in the first place.

(Java does have type unsafety via e.g. covariant arrays, but that's a design choice, not a fundamental limitation)

ackfoobar · on March 14, 2024

> type unsafety via e.g. covariant arrays

It's ironic when you think about it.

The compiler enforced type safety is often argued as not type safe because they are erased at runtime; but the actual type unsafety has a runtime check.

lmm · on March 14, 2024

> The compiler enforced type safety is often argued as not type safe because they are erased at runtime; but the actual type unsafety has a runtime check.

Were you trying to make the point that the biggest type safety hole in Java is in the one part where you have collections with unerased types? Because yeah that's kind of ironic.

saghm · on March 13, 2024

This isn't due to type erasure though, it's due to allowing fallible subtyping; in particular, it shouldn't be possible to assign an array of a subtype to a variable of an array of a supertype. For example:

     class Animal {}
     class Dog extends Animal {}
     class Cat extends Animal {}
     ...
     Animal[] animals = new Dog[1]{};
     animals[0] = new Cat();

The compiler could easily stop this issue by making the assignment of the `Dog` array to the `Animal` array an error because after the assignment, it's already impossible to add anything that's not a `Dog` to the array anyhow. Type erasure doesn't change this because regardless of any knowledge of the type at runtime, there's nothing reasonable to do but throw an exception because a `Cat` can't be put into a `Dog` array.

I think the focus on type erasure is also a bit of a red herring because Java _does_ actually allow introspection of types at runtime via reflection, whereas Rust doesn't provide this. What information is available at runtime isn't really a factor because type safety is a property that's enforced at compile time.

didgetmaster · on March 13, 2024

I learned multi-threaded programming on OS/2 back in the late 80s. Ever since, I have designed any code I have written to handle multiple threads, even if I thought only a single thread would execute it.

That has saved me many rewrites over the years when other threads were introduced.

When I began writing my new data management system, multi-threading was designed into every component. It is thrilling when your code runs nearly twice as fast simply because you run it on a CPU with twice as many cores.

agumonkey · on March 13, 2024

Any tips or books you'd suggest?

cogman10 · on March 13, 2024

Tips: Don't mutate. Prefer pure functions over functions that mutate state and parameters. Ideally you interact with your system by giving it input and having it return results, not change some state somewhere.

For example, It's not uncommon to see code that looks something like this

    void process(vector<int>& stuff, vector<Foo>& values_to_compute) {
       for (auto value_to_compute : values_to_compute) {
         stuff.push_back(compute(value_to_compute));
       }
    }

In the rush to try and avoid copying values or allocating, this program is now hard to multithread safely. You might want `stuff` to be shared and filled up across threads but that's not possible to safely do without a bunch of locks.

The better solution looks something like this

    vector<int> process(vector<Foo>& values_to_compute) {
      vector<int> values;
      for (auto value_to_compute : values_to_compute) {
         values.push_back(compute(value_to_compute));
      }
      return values;       
    }

Now to get threading, it's easy to split `values_to_compute` up to manageable chunk sizes and send them pinging down the system for as many cores as you have available. Then you can aggregate the results at the end.

The way to think about these things is primarily about limiting shared mutable state among functions as much as possible. A rule of thumb is if you have functions which return `void` you are potentially making a headache for concurrency down the road.

stephc_int13 · on March 13, 2024

The functional style is nice and all, but once you’ve reaped the low hanging fruits some problems are very hard or cumbersome to tackle in a stateless fashion.

I think this is bad advice, the lead to multiple dead ends. Especially in the fields I know, video games and embedded software, state and mutation are everywhere.

Embrace the state, parallelise your code instead of making it concurrent. Easier, almost always faster and more robust.

cogman10 · on March 13, 2024

This could very much be that different applications have different strategies that are applicable.

For games and embedded systems, memory and CPU cycles are at a premium. My strat works best in languages with a GC where memory is abundant and cheap to allocate and there's not strict CPU time requirements.

For the field I'm familiar with, microservices and backends, this works really well and mutation can generally be controlled (usually the only permanent state is the database).

I agree shared state is probably faster, I'm not sure I agree with it being robust. Parallel code with shared state can be deceptively tricky to get right, especially when trying to hold multiple locks at the same time (which you have to do if you want to be fast). Nobody likes debugging why an app suddenly freezes for no apparent reason (deadlock).

stephc_int13 · on March 13, 2024

No, I am talking about lockless design.

Split/join style. It is almost like a single threaded code, except that all heavy work is done as fast as possible in parallel, the single threaded part is almost free in practice, less than 1ms in total per frame, often 10 times less.

cogman10 · on March 13, 2024

Split/join is also what I was talking about but storing the values in immutable temporary heap locations rather than in a common lockless queue.

If you have the memory and allocation speed, that can be much faster than a lockless queue (especially if that queue is under high contention). However, for a game and embedded system I'd imagine the impact on heap fragmentation would be a pretty significant reason not to do this.

For a GCed language, those heap allocations are fairly cheap, quiet a bit cheaper than what you pay for in a managed memory language (In GCed languages, the majority of memory allocations are 1 or 2 conditional checks with a pointer bump. It's very nearly the same speed as stack allocation).

When I was talking about mutable state (and my example is perhaps bad), I'm more talking about system or process wide mutable state.

nine_k · on March 14, 2024

Do mutable things inside your function. Have an immutable public interface. Also, split state so that it's not shared mutably among threads, and share immutable views when you need other threads to see it.

There are more ways to not expose mutation.

david422 · on March 13, 2024

I've done some threaded code - and certainly not an expert, but my tips would be:

1. don't use global variables

2. when sharing data between threads - use message queues, not objects

3. if you do have shared objects, make your public methods thread safe by using mutexes and your internal methods don't need mutexes

didgetmaster · on March 14, 2024

My system is designed to handle large amounts of data. For example, it might have a relational table with a billion rows. The values for each column are stored in such a way that separate threads can operate on different segments of the data.

For example: if a query wants to find all addresses in an 'address' column that end in 'Avenue' and there are a billion unique addresses; then 10 separate threads might each look through 100 million values.

The code has each thread gather its own set of results and only adds them to the final result set (shared) when the search completes. The shared object only needs to be locked once by each thread as it appends its results to the list. This avoids contention that might occur if each thread tried to add a single value to the shared list as it found it.

stephc_int13 · on March 13, 2024

Always favor parallelism over concurrency, if your design need to lock, there is likely a problem.

Don’t try to do the work in a magical fashion, that is hidden in the low level code and trying to be universal.

JonChesterfield · on March 13, 2024

Coherent post. Could draw more attention to the most probable consequence of adding multithreading to a program - a marked decrease in reliability and moderate decrease in performance.

There's something pessimistic to be said on the maintenance properties of multithreaded programs. Even if the original design is sound, it's unlikely to remain so under subsequent patches by various people.

I don't know of a way to create a correct program that remains correct under subsequent mutation if it's built on mutex/semaphore/atomic operations.

Maybe it could be done with a really aggressive type system and only allowing message passing, but that'll still pick up livelock/deadlock behaviours when threads wait on messages.

andyferris · on March 13, 2024

Doesn’t rust ensure your multithreaded code remains safe under mutation when refactoring the code?

klabb3 · on March 13, 2024

Yes, but not against deadlocks, leaks and atomic ref count locking overhead etc. Depending on how you write your code, (and the runtime if you’re using async) you usually need to abandon a certain subset of the key selling points of Rust (eg RAII, compile time borrow checks, and other zero-cost things). It’s bulky but yes, memory safe.

stephc_int13 · on March 13, 2024

Very good article, I use mostly similar design ideas around multi-threaded code in my framework and projects.

The main take is to avoid concurrency and instead use parallelism as much as possible.

Also, a carefully engineered gameState or appState structure can help, my design is based around buffering the full state, so the work done during one step (typically a frame) is given as reference to the previous, frozen and immutable state as input, and can only write on the next state, with split/join of containers (my own generalized flavor of arena) when parallel work is done.

I think that full concurrency is possible and has been proven to work, but IMO is a very costly and risky tradeoff, and unlikely to achieve much higher performance.

barrenko · on March 13, 2024

Tackling this myself atm, as a self-taught CS enthusiast reasoning about multiprocessing in python.

lkrubner · on March 13, 2024

Are you aware how difficult this is? Python has a complicated and messy history regarding concurrency, and even as recently as 2012 the most common advice was "avoid threads and use processes." Python was late to come up with any story around concurrency.

lewispollard · on March 13, 2024

> even as recently as 2012 the most common advice was "avoid threads and use processes."

Ok, I know in the grand scheme of things this may seem recent, but that was 12 years ago. Has the story really not changed since then?

galbar · on March 13, 2024

Not really, multi-threaded parallelism is still limited by the GIL and it is best to use multiple processes instead. Unless you are using threads for I/O, or anything not using the CPU.

What has changed since then is that Python now supports single-threaded concurrency, through async-await semantics.

jerf · on March 13, 2024

Adding threading to a dynamic scripting language that had a multi-decade run as a single-threaded runtime turns out to be really, really difficult, especially if the language had C extensions. They all have had one form or another of very hard time about it, because it is a legitimately difficult problem.

LAC-Tech · on March 13, 2024

Enjoying this article. It's an area of computing I am very weak on, and it's really good to see an intelligent formulation of the why and how of multi-threaded computing in a broader context.

spintin · on March 13, 2024

Or you just go with "arrays of 64 byte multiple atomic structures" and you get parallelism and cache!

foofie · on March 13, 2024

Great article. I think it would benefit from a sentence or paragraph on Amdahl's law to help drive the point home about potential performance improvements of serial code, which I would classify synchronization as one type.

https://en.wikipedia.org/wiki/Amdahl%27s_law

stephc_int13 · on March 13, 2024

I think that Amdahl Law is the reason why many engineers follow the full concurrency route instead of exploring parallelism more extensively, and that is a shame because while the theoretical limit of concurrency is higher, in practice the many, complex and sometimes mind boggling problems related to synchronisation can prevent such gains, and the problems don’t scale in a nice way, to put it mildly.

On the other hand, parallelism scales nicely, and is mostly simple with a few rules.

nyanpasu64 · on March 13, 2024

In the article, wouldn't the runtime of the "join operations" step limit how much performance can be gained by parallelizing computation (due to Amdahl's Law as you brought up)?