I've read (among other things) both Shiver's dissertation on CFA and Might's dissertation on extending CFA, as well as Might's m-CFA paper. In Shiver's dissertation, he gets the naive exponential CFA down to polynomial by using the single-threaded store. In Might's dissertation, he extend this with various analyses on CFA environments, such as Gamma CFA where you garbage collect dead elements of the environment to reduce processing time. Then, in the m-CFA paper, he shows that you can actually produce a family of polynomial CFA algorithms by using what is tantamount to flat closure conversion instead of linked closure conversion.
The running theme here seems to be that CFA has the state explosion problem and memory management is crucial to the runtime of any CFA analysis. So much so that it can actually