I wonder if something like the GHC's per thread allocation limits could be deployed to the JVM. I've seen enough OOMs kill Java applications serving tons of user requests that any kind of defence would be lovely no matter how fiddly.

What?

The JVM has a single large heap shared between all threads. That's pretty normal. Python mostly works the same way. Its how MySQL works. But its not how PostgreSQL works. Or PHP. The trouble with a single heap is that it makes the JVM vulnerable to a class of bugs where one thread can starve all the others of heap and can cause the entire process to sieze up.

PostgreSQL and PHP don't have that problem for the most part because they run one heap per thread. PostgreSQL spawn subprocesses for every connection and has careful memory limits. Zend PHP does the same. HHVM uses native threads but isolates the heaps. I haven't looked but I imagine Erlang is similar to HHVM.

You can defend against those runaway memory consumption bugs with disciple and code review, I guess. But I don't think its worth the risk. Its too easy to make a mistake and let bad code in. So we need to make sure that out of memory problems like this have the smallest blast radius possible like PostgreSQL and PHP do.

Why allocation limits

You can't un-shared-memory the JVM. There is too much code that expects fast access to other thread's data. Its normal. You could totally design a server that keeps a stable of JVM processes and farms the request out to one server at a time, essentially Apache's prefork, but that plays against the JVM's strength and plays up its weaknesses (slow start, large per process overhead, fork is a nightmare, etc).

So it'd be nice to have some limits. The first obvious thing is Isolates, but that would require lots of retooling and it must be hard to implement anyway because it hasn't been. The next obvious thing is per thread reachability limts but I can't imagine it'd be fast to keep that information up to date. And I suspect it'd really really hard to configure the limits sensibly.

So maybe it'd be simpler to throw per thread allocation limits into the mix. The JVM already does bump the pointer allocation so why not check if the pointer is bumped too far and throw an Error when it is? I know its more complicated than that. Tons more complicated. But my instinct is that its the least complicated option. Less complicated that Isolates and reachability limits at least.

Why allocation limits would be hard to use

Here are the reasons I can think of right now: 1. Java programmers usually have no idea how much memory they consume so they have no way of setting the limits intelligently. 2. Short lived object get cleaned so fast that maybe they shouldn't count at all. 3. It doesn't play right with thread pools. You'd really want per Runnable/Callable/Whatever limits. 4. What do you do when the limit is reached? 5. If there isn't a limit set on a thread it can still crash the system. You can't set a limit on all threads by default.

I expect that to be less than a tenth of the real reasons but I don't think it matters. I think the stability benefits outweight the complexities of using it. Of implementing it, I dunno.

Proposal

I'd like to take a stab at solving some of these problems. Just for fun. For number 3 and 4 I propose that the limit should be a call stack construct like try...catch...finally. Kinda like this:

try memory 5k {

} catch memory (e) {

} finally {

}

When the error is reached the JVM would unwind throw some ScopedOomError and disolve the limit. finally blocks wouldn't be subject to the limit so they could do what they need to do. You'd want to make the ScopedOomError uncatchable outside the try memory block in which it is declared.

If you declare try memory blocks inside eachother they just set recursive limits. So you could limit a particular activity to some smaller amount inside a larger limit. You couldn't raise the limit from the inside though.

But that isn't really a very good proposal!

Yeah, that proposal doesn't solve lots of the problems. It doesn't protect you from rogue finally blocks or people not setting the limits. It doesn't do any liveness checks. Nor does it tell you how to set the limit safely. But maybe its a good start? Maybe we don't have to solve all the problems and make something perfect. Just good enough.