Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> No more praying that your program isn't unceremoniously killed just for asking for more memory - all allocations are assumed fallible and failures must be handled explicitly.

But for operating systems with overcommit, including Linux, you won't ever see the act of allocation fail, which is the whole point. All the language-level ceremony in the world won't save you.





Even on Linux with overcommit you can have allocations fail, in practical scenarios.

You can impose limits per process/cgroup. In server environments it doesn't make sense to run off swap (the perf hit can be so large that everything times out and it's indistinguishable from being offline), so you can set limits proportional to physical RAM, and see processes OOM before the whole system needs to resort to OOMKiller. Processes that don't fork and don't do clever things with virtual mem don't overcommit much, and large-enough allocations can fail for real, at page mapping time, not when faulting.

Additionally, soft limits like https://lib.rs/cap make it possible to reliably observe OOM in Rust on every OS. This is very useful for limiting memory usage of a process before it becomes a system-wide problem, and a good extra defense in case some unreasonably large allocation sneaks past application-specific limits.

These "impossible" things happen regularly in the services I worked on. The hardest part about handling them has been Rust's libstd sabotaging it and giving up before even trying. Handling of OOM works well enough to be useful where Rust's libstd doesn't get in the way.

Rust is the problem here.


I hear this claim on swap all the time, and honestly it doesn't sound convincing. Maybe ten or twenty years ago, but today? CAS latency for DIMM has been going UP, and so is NVMe bandwidth. Depending on memory access patterns, and whether it fits in the NVMe controller's cache (the recent Samsung 9100 model includes 4 GB of DDR4 for cache and prefetch) your application may work just fine.

Swap can be fine on desktops where usage patterns vary a lot, and there are a bunch of idle apps to swap out. It might be fine on a server with light loads or a memory leak that just gets written out somewhere.

What I had in mind was servers scaled to run near maximum capacity of the hardware. When the load exceeds what the server can handle in RAM and starts shoving requests' working memory into swap, you typically won't get higher throughput to catch up with the overload. Swap, even if "fast enough", will slow down your overall throughput when you need it to go faster. This will make requests pile up even more, making more of them go into swap. Even if it doesn't cause a death spiral, it's not an economical way to run servers.

What you really need to do is shed the load before it overwhelms the server, so that each box runs at its maximum throughput, and extra traffic is load-balanced elsewhere, or rejected, or at least queued in some more deliberate and efficient fashion, rather than franticly moving server's working memory back and forth from disk.

You can do this scaling without OOM handling if you have other ways of ensuring limited memory usage or leaving enough headroom for spikes, but OOM handling lets you fly closer to the sun, especially when the RAM cost of requests can be very uneven.


It's almost never the case that memory is uniformly accessed, except for highly artificial loads such as doing inference on a large ML model. If you can stash the "cold" parts of your RAM working set into swap, that's a win and lets you serve more requests out of the same hardware compared to working with no swap. Of course there will always be a load that exceeds what the hardware can provide, but that's true regardless of how much swap you use.

Swap isn't just for when you run out of ram though.

Don't look at swap as more memory on slow / hdds. Look at it as a place the kernel can use if it needs a place to put something temporarily.

This can happen on large memory systems fairly easily when memory gets fragments and something asks for a chunk of memory than can't be allocated because there isn't a large enough contiguous block, so the allocation fails.

I always do a least a couple of GBs now for swap... I won't really miss the storage and that at least gives the kernel a place to re-org/compact memory and keep chugging along.


Sure, but you can do the next best thing, which is to control precisely when and where those allocations occur. Even if the possibility of crashing is unavoidable, there is still huge operational benefit in making it predictable.

Simplest example is to allocate and pin all your resources on startup. If it crashes, it does so immediately and with a clear error message, so the solution is as straightforward as "pass bigger number to --memory flag" or "spec out larger machine".


No, this is still misunderstanding.

Overcommit means that the act of memory allocation will not report failure, even when the system is out of memory.

Instead, failure will come at an arbitrary point later, when the program actually attempts to use the aforementioned memory that the system falsely claimed had been allocated.

Allocating all at once on startup doesn't help, because the program can still fail later when it tries to actually access that memory.


Which is why I said "allocate and pin". POSIX systems have mlock()/mlockall() to prefault allocated memory and prevent it from being paged out.

Aha, my apologies, I overlooked that.

Random curious person here: does mlock() itself cause the pre-fault? Or do you have to scribble over that memory yourself, too?

(I understand that mlock prevents paging-out, but in my mind that's a separate concern from pre-faulting?)


FreeBSD and OpenBSD explicitly mention the prefaulting behavior in the mlock(2) manpage. The Linux manpage alludes to it in that you have to explicitly pass the MLOCK_ONFAULT flag to the mlock2() variant of the syscall in order to disable the prefaulting behavior.

To be fair, you can enforce this just by filling all the allocated memory with zero, so it's possible to fail at startup.

Or, even simpler, just turn off over-commit.

But if swap comes into the mix, or just if the OS decides it needs the memory later for something critical, you can still get killed.


I would be suprised if some os detects the page of zeros and removes that allocation until you need it. this seems like a common enough case as to make it worth it when memory is low. I'm not aware of any that do, but it wouldn't be that hard and so seems like someone would try it.

There's also KSM, kernel same-page merging.

Overcommit only matters if you use the system allocator.

To me, the whole point of Zig's explicit allocator dependency injection design is to make it easy to not use the system allocator, but something more effective.

For example imagine a web server where each request handler gets 1MB, and all allocations a request handler does are just simple "bump allocations" in that 1MB space.

This design has multiple benefits: - Allocations don't have to synchronize with the global allocator. - Avoids heap fragmentation. - No need to deallocate anything, we can just reuse that space for the next request. - No need to care about ownership -- every object created in the request handler lives only until the handler returns. - Makes it easy to define an upper bound on memory use and very easy to detect and return an error when it is reached.

In a system like this, you will definitely see allocations fail.

And if overcommit bothers someone, they can allocate all the space they need at startup and call mlock() on it to keep it in memory.


The Rust folks are also working on having local allocators/arenas in the language, or perhaps a generalization of them known as "Storages" that might also interact in non-trivial ways with other work-in-progress features such as safe transmute or placement "new". The whole design space is somewhat in flux, that's why it's not part of stable Rust yet.

I imagine people who care about this sort of thing are happy to disable overcommit, and/or run Zig on embedded or specialized systems where it doesn't exist.

There are far more people running/writing Zig on/for systems with overcommit than not. Most of the hype around Zig come from people not in the embedded world.

If we can produce a substantial volume of software that can cope with allocation failures then the idea of using something than overcommit as the default becomes feasible.

It's not a stretch to imagine that a different namespace might want different semantics e.g. to allow a container to opt out of overcommit.

It is hard to justify the effort required to enable this unless it'll be useful for more than a tiny handful of users who can otherwise afford to run off an in-house fork.


> If we can produce a substantial volume of software that can cope with allocation failures then the idea of using something than overcommit as the default becomes feasible.

Except this won't happen, because "cope with allocation failure" is not something that 99.9% of programs could even hope to do.

Let's say that you're writing a program that allocates. You allocate, and check the result. It's a failure. What do you do? Well, if you have unneeded memory lying around, like a cache, you could attempt to flush it. But I don't know about you, but I don't write programs that randomly cache things in memory manually, and almost nobody else does either. The only things I have in memory are things that are strictly needed for my program's operation. I have nothing unnecessary to evict, so I can't do anything but give up.

The reason that people don't check for allocation failure isn't because they're lazy, it's because they're pragmatic and understand that there's nothing they could reasonably do other than crash in that scenario.


Have you honestly thought about how you could handle the situation better than an crash?

For example, you could finish writing data into files before exiting gracefully with an error. You could (carefully) output to stderr. You could close remote connections. You could terminate the current transaction and return an error code. Etc.

Most programs are still going to terminate eventually, but they can do that a lot more usefully than a segfault from some instruction at a randomized address.


I used to run into allocation limits in opera all the time. Usually what happened was a failure to allocate a big chunk of memory for rendering or image decompression purposes, and if that happens you can give up on rendering the current tab for the moment. It was very resilient to those errors.

Even when I have a cache - it is probably in a different code path / module and it would be a terrible architecture that let me access that code.

A way to access an "emergency button" function is a significantly smaller sin than arbitrary crashes.

I question that. I would expect in most cases that even if you manage to free up some memory you only have a little bit longer to run before something else uses all the memory and you are back to the original out of memory problem but no place to free up more. Not to mention those caches you just cleared should exist for a good reason and so your program is running slower in the mean time.

What if for my program, 99.99% of OOM crashes are preventable by simply running a GC cycle?

> If we can produce a substantial volume of software that can cope with allocation failures then the idea of using something than overcommit as the default becomes feasible.

What would "cope" mean? Something like returning an error message like "can't load this image right now"? Such errors are arguably better than crashing the program entirely but still worth avoiding.

I think overcommit exists largely of fork(). In theory a single fork() call doubles the program's memory requirement (and the parent calling it n times in a row (n+1)s the memory requirement). In practice, the OS uses copy-on-write to avoid both this requirement and the expense of copying. Most likely the child won't really touch much of its memory before exit or exec(). Overallocation allows taking advantage of this observation to avoid introducing routine allocation failures after large programs fork().

So if you want to get rid of overallocation, I'd say far more pressing than introducing alloc failure handling paths is ensuring nothing large calls fork(). Fortunately fork() isn't really necessary anymore IMHO. The fork pool concurrency model is largely dead in favor of threading. For spawning child processes with other executables, there's posix_spawn (implemented by glibc with vfork()). So this is achievable.

I imagine there are other programs around that take advantage of overcommit by making huge writable anonymous memory mappings they use sparsely, but I can't name any in particular off the top of my head. Likely they could be changed to use another approach if there were a strong reason for it.


I never said that all Zig users care about recovering from allocation failure.

> Most of the hype around Zig come from people not in the embedded world.

Yet another similarity with Rust.


> you won't ever see the act of allocation fail

ever? If you have limited RAM and limited storage on a small linux SBC, where does it put your memory?


It handles OOM by killing processes.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: