Yesterdays outage

Yesterday afternoon the rendering daemon failed in an interesting way, the preview images were rendered for most map requests, but not the actual map files.

It turns out that the root cause was a rendering request for a rather large area that consumed all available memory, leading to a short phase of swapping, and eventually a hard out-of-memory kill. Somehow the rendering service didn’t recover from this properly, and so the following requests were not properly processed either.

I thought I had memory issues covered in the renderers systemd service, by setting LimitRSS, the apparent systemd equivalent for “ulimit -m”.

But it turns out I had been “StackOverflowed” on this. Looking at the systemd.exec man page I learned that:

“LimitRSS= is not implemented on Linux, and setting it has no effect.”

Now I have replaced LimitRSS in the renderers service file with MemoryMax, and this actually seems to do the job. I re-ran all failed requests. This time the “bad” request still failed as it consumed too much memory, but it failed when reaching the set limit of 50% of total RAM already, without driving the system into swapping, and the following requests were properly processed again.

Hopefully this now has fixed “runaway memory” issues once and for all, not by preventing them in the first place, but by properly handling them when they occure, letting just that one render request fail, but not the following ones in the queue.