Last active
December 18, 2015 03:48
-
-
Save WillNess/5720292 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Q.: | |
| Lets say I'm given two functions: | |
| f :: [a] -> b | |
| g :: [a] -> c | |
| I want to write a function that is the equivalent of this: | |
| h x = (f x, g x) | |
| But when I do that, for large lists inevitably I run out of memory. | |
| A simple example is the following: | |
| x = [1..100000000::Int] | |
| main = print $ (sum x, product x) | |
| I understand this is the case because the list `x` is being stored in memory without | |
| being garbage collected. It would be better instead of `f` and `g` worked on `x` in, | |
| well, "parallel". | |
| Assuming I can't change `f` and `g`, nor want to make a separate copy of `x` (assume | |
| `x` is expensive to produce) how can I write `h` without running into out of memory issues? | |
| ______________________________________________________________________________________ | |
| that's a compiler issue: stick some mutual GOTO in there and we're done! | |
| form a pair of corountines / yielding-chain, joining the two execution frames | |
| together, each storing its internal vars/interim results/completion status, | |
| so that list is consumed in sequence - **IFF** each consumes list in sequence, | |
| so compiles DOWN to some LOOPS (so, yield to neighbor on each iteration of loop). | |
| *** CONSTANT SPACE operation should be one of COMPILER GOALs *** | |
| ______________________________________________________________________________________ | |
| If you can turn your functions into folds, you can then just use them with a scan: | |
| x = [1..100000000::Int] | |
| main = mapM_ print . tail . scanl foo (a0,b0) . takeWhile (not.null) | |
| . unfoldr (Just . splitAt 1000) -- adjust the chunk length as needed | |
| $ x | |
| foo (a,b) x = let a2 = f' a $ f x ; b2 = g' b $ g x | |
| in a2 `seq` b2 `seq` (a2, b2) | |
| f :: [t] -> a -- e.g. sum | |
| g :: [t] -> b -- (`rem` 10007) . product | |
| f' :: a -> a -> a -- e.g. (+) | |
| g' :: b -> b -> b -- ((`rem` 10007) .) . (*) | |
| we consume the input in chunks for better performance. Compiled with `-O2`, this | |
| should run in a constant space. The interim results are printed as indication of progress. | |
| If you can't turn your function into a fold, this means it *has* to consume the whole | |
| list to produce any output and this trick doesn't apply. E.g., | |
| f xs = sum . map (uncurry (+)) $ | |
| zip (map (xs!!) [1,3..n]) | |
| (map (xs!!) [m, m-2, .. 0]) | |
| where | |
| len = length xs | |
| (n,m) = if even len then (len-1,len-2) else (len-2,len-1) | |
| or, | |
| f xs = sum . map (uncurry (+)) . uncurry zip . second reverse | |
| . foldr (\x ~(b,a) -> (x:a,b)) ([],[]) | |
| $ xs | |
| the above "crazy function" is of course EQUIVALENT to `sum`, BECAUSE | |
| -------> (toList . reify) for `f` is a permutation of the one for `sum`, BECAUSE | |
| -------> there's no element duplication / skippage, and | |
| (+) is | |
| - commutative | |
| - associative | |
| so SOURCE CONSUMPTION can be REARRANGED into the SEQUENTIAL one |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment