As part of my “real job”, I’m developing a Clojure-based app that does a significant amount of number crunching. The inner loop continuously ref-sets random portions of a large array of refs (when I say “large”, I mean that I can plausibly fire off a 50gb heap). I had a tough time getting it performant, and it’s an interesting enough story that I thought I’d relate it here.
After the standard futzing with algorithms and data structures, the thing that ended up holding me back was excessive time in GC. I started with the “throughput” collector (hey, I’m doing number crunching, I don’t have real-time requirements, throughput is awesome!). Somewhat surprisingly, I saw worse and worse performance as my app ran, ending in a kind of sawtoothed purgatory of GC. What little information I found about Clojure-specific GC tuning uniformly showed using the CMS / low-latency / concurrent collector as a good choice. Cu