WillNess · December 18, 2015 03:48
diff --git a/crazy function.hs b/crazy function.hs
 Q.:

 Lets say I'm given two functions:

    f :: [a] -> b
    g :: [a] -> c

 I want to write a function that is the equivalent of this:

    h x = (f x, g x)

 But when I do that, for large lists inevitably I run out of memory.

 A simple example is the following:

    x = [1..100000000::Int] 
    main = print $ (sum x, product x)

 I understand this is the case because the list `x` is being stored in memory without 
 being garbage collected. It would be better instead of `f` and `g` worked on `x` in, 
 well, "parallel". 

 Assuming I can't change `f` and `g`, nor want to make a separate copy of `x` (assume 
 `x` is expensive to produce) how can I write `h` without running into out of memory issues?
 ______________________________________________________________________________________

 that's a compiler issue: stick some mutual GOTO in there and we're done!
 form a pair of corountines / yielding-chain, joining the two execution frames
 together, each storing its internal vars/interim results/completion status,
 so that list is consumed in sequence - **IFF** each consumes list in sequence,
 so compiles DOWN to some LOOPS  (so, yield to neighbor on each iteration of loop).

   *** CONSTANT SPACE operation should be one of COMPILER GOALs *** 

 ______________________________________________________________________________________
 If you can turn your functions into folds, you can then just use them with a scan:

    x = [1..100000000::Int] 
    main = mapM_ print . tail . scanl foo (a0,b0) . takeWhile (not.null)  
             . unfoldr (Just . splitAt 1000)  -- adjust the chunk length as needed
             $ x
    
    foo (a,b) x = let a2 = f' a $ f x ; b2 = g' b $ g x
                  in a2 `seq` b2 `seq` (a2, b2)
    
    f :: [t] -> a         -- e.g. sum
    g :: [t] -> b         --      (`rem` 10007) . product
    f' :: a -> a -> a     -- e.g. (+)
    g' :: b -> b -> b     --      ((`rem` 10007) .) . (*)

 we consume the input in chunks for better performance. Compiled with `-O2`, this 
 should run in a constant space. The interim results are printed as indication of progress.

 If you can't turn your function into a fold, this means it *has* to consume the whole 
 list to produce any output and this trick doesn't apply. E.g.,

    f xs = sum . map (uncurry (+)) $ 
             zip (map (xs!!) [1,3..n])
                 (map (xs!!) [m, m-2, .. 0])
      where
        len = length xs
        (n,m) = if even len then (len-1,len-2) else (len-2,len-1)

 or,

    f xs = sum . map (uncurry (+)) . uncurry zip . second reverse 
               . foldr (\x ~(b,a) -> (x:a,b)) ([],[])
               $ xs

 the above "crazy function" is of course EQUIVALENT to `sum`, BECAUSE 
 ------->  (toList . reify) for `f` is a permutation of the one for `sum`,  BECAUSE
 ------->  there's no element duplication / skippage, and

    (+) is
         -  commutative
         -  associative

 so SOURCE CONSUMPTION can be REARRANGED into the SEQUENTIAL one
	Q.:

	Lets say I'm given two functions:

	f :: [a] -> b
	g :: [a] -> c

	I want to write a function that is the equivalent of this:

	h x = (f x, g x)

	But when I do that, for large lists inevitably I run out of memory.

	A simple example is the following:

	x = [1..100000000::Int]
	main = print $ (sum x, product x)

	I understand this is the case because the list `x` is being stored in memory without
	being garbage collected. It would be better instead of `f` and `g` worked on `x` in,
	well, "parallel".

	Assuming I can't change `f` and `g`, nor want to make a separate copy of `x` (assume
	`x` is expensive to produce) how can I write `h` without running into out of memory issues?
	______________________________________________________________________________________

	that's a compiler issue: stick some mutual GOTO in there and we're done!
	form a pair of corountines / yielding-chain, joining the two execution frames
	together, each storing its internal vars/interim results/completion status,
	so that list is consumed in sequence - IFF each consumes list in sequence,
	so compiles DOWN to some LOOPS (so, yield to neighbor on each iteration of loop).

	* CONSTANT SPACE operation should be one of COMPILER GOALs *

	______________________________________________________________________________________
	If you can turn your functions into folds, you can then just use them with a scan:

	x = [1..100000000::Int]
	main = mapM_ print . tail . scanl foo (a0,b0) . takeWhile (not.null)
	. unfoldr (Just . splitAt 1000) -- adjust the chunk length as needed
	$ x

	foo (a,b) x = let a2 = f' a $ f x ; b2 = g' b $ g x
	in a2 `seq` b2 `seq` (a2, b2)

	f :: [t] -> a -- e.g. sum
	g :: [t] -> b -- (`rem` 10007) . product
	f' :: a -> a -> a -- e.g. (+)
	g' :: b -> b -> b -- ((`rem` 10007) .) . (*)

	we consume the input in chunks for better performance. Compiled with `-O2`, this
	should run in a constant space. The interim results are printed as indication of progress.

	If you can't turn your function into a fold, this means it has to consume the whole
	list to produce any output and this trick doesn't apply. E.g.,

	f xs = sum . map (uncurry (+)) $
	zip (map (xs!!) [1,3..n])
	(map (xs!!) [m, m-2, .. 0])
	where
	len = length xs
	(n,m) = if even len then (len-1,len-2) else (len-2,len-1)

	or,

	f xs = sum . map (uncurry (+)) . uncurry zip . second reverse
	. foldr (\x ~(b,a) -> (x:a,b)) ([],[])
	$ xs

	the above "crazy function" is of course EQUIVALENT to `sum`, BECAUSE
	-------> (toList . reify) for `f` is a permutation of the one for `sum`, BECAUSE
	-------> there's no element duplication / skippage, and

	(+) is
	- commutative
	- associative

	so SOURCE CONSUMPTION can be REARRANGED into the SEQUENTIAL one
No results found