Skip to content

Instantly share code, notes, and snippets.

@WillNess
Last active December 18, 2015 03:48
Show Gist options
  • Select an option

  • Save WillNess/5720292 to your computer and use it in GitHub Desktop.

Select an option

Save WillNess/5720292 to your computer and use it in GitHub Desktop.
Q.:
Lets say I'm given two functions:
f :: [a] -> b
g :: [a] -> c
I want to write a function that is the equivalent of this:
h x = (f x, g x)
But when I do that, for large lists inevitably I run out of memory.
A simple example is the following:
x = [1..100000000::Int]
main = print $ (sum x, product x)
I understand this is the case because the list `x` is being stored in memory without
being garbage collected. It would be better instead of `f` and `g` worked on `x` in,
well, "parallel".
Assuming I can't change `f` and `g`, nor want to make a separate copy of `x` (assume
`x` is expensive to produce) how can I write `h` without running into out of memory issues?
______________________________________________________________________________________
that's a compiler issue: stick some mutual GOTO in there and we're done!
form a pair of corountines / yielding-chain, joining the two execution frames
together, each storing its internal vars/interim results/completion status,
so that list is consumed in sequence - **IFF** each consumes list in sequence,
so compiles DOWN to some LOOPS (so, yield to neighbor on each iteration of loop).
*** CONSTANT SPACE operation should be one of COMPILER GOALs ***
______________________________________________________________________________________
If you can turn your functions into folds, you can then just use them with a scan:
x = [1..100000000::Int]
main = mapM_ print . tail . scanl foo (a0,b0) . takeWhile (not.null)
. unfoldr (Just . splitAt 1000) -- adjust the chunk length as needed
$ x
foo (a,b) x = let a2 = f' a $ f x ; b2 = g' b $ g x
in a2 `seq` b2 `seq` (a2, b2)
f :: [t] -> a -- e.g. sum
g :: [t] -> b -- (`rem` 10007) . product
f' :: a -> a -> a -- e.g. (+)
g' :: b -> b -> b -- ((`rem` 10007) .) . (*)
we consume the input in chunks for better performance. Compiled with `-O2`, this
should run in a constant space. The interim results are printed as indication of progress.
If you can't turn your function into a fold, this means it *has* to consume the whole
list to produce any output and this trick doesn't apply. E.g.,
f xs = sum . map (uncurry (+)) $
zip (map (xs!!) [1,3..n])
(map (xs!!) [m, m-2, .. 0])
where
len = length xs
(n,m) = if even len then (len-1,len-2) else (len-2,len-1)
or,
f xs = sum . map (uncurry (+)) . uncurry zip . second reverse
. foldr (\x ~(b,a) -> (x:a,b)) ([],[])
$ xs
the above "crazy function" is of course EQUIVALENT to `sum`, BECAUSE
-------> (toList . reify) for `f` is a permutation of the one for `sum`, BECAUSE
-------> there's no element duplication / skippage, and
(+) is
- commutative
- associative
so SOURCE CONSUMPTION can be REARRANGED into the SEQUENTIAL one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment