There have been two incidents where rendering stopped for several hours this weekend (see the yellow curve in the monitoring graph), and it could hardly happen at a worse time, wit me traveling and attending FOSDEM in Brussels.
I hope to have identified the root cause so that it should not happen again, and even if it does I’d get aware of it earlier, now that I’m back at home again …
Turns out I was too optimistic, another incident happened today.
I now was able to track this down to requests for a specific map style, and apparently one of its SQL queries ran into what looks like a PostGIS memory leak, this requires more investigation though.
Meanwhile I disabled the affected style, and changed the restart settings for the rendering service daemons, so even if this problem occurs again rendering should continue after a minute max.
I hope with that I have really tackled it now …