I'm in need of help to profile where the slow part(s) of this relative-date util logic is, and find reasonable tweaks or work-arounds to thse bottlenecks.
I'm actually happy to even pay a bit of a bounty ($USD) for someone who can produce an analysis of the actual problem(s) and clear, workable solutions.
Here's an example estimated threshold for improvement that might be helpful (and earn a bounty): the 500-iteration loop above might be taking 200-300ms (on my system), and I need it to be able to run in less than 10ms (which should be completely doable I think, it's not that complex of logic).
IOW, we're going to need at least an order of magnitude improvement, not just 5-10% improvement.