Skip to content

Instantly share code, notes, and snippets.

@copumpkin
Last active December 23, 2015 14:09
Show Gist options
  • Save copumpkin/6647111 to your computer and use it in GitHub Desktop.
Save copumpkin/6647111 to your computer and use it in GitHub Desktop.
Mutable buffered Iteratee-like things. Goal is to let me reuse the input "chunks" but to prevent users from hanging onto them and seeing data they shouldn't see. Does it work? You can leak the MVector reference in an existential but then you can't do anything with it, right?
{-# LANGUAGE RankNTypes, BangPatterns #-}
module Mutable where
import Control.Monad
import Control.Monad.ST
import Data.STRef
import Data.Vector.Mutable (MVector)
import qualified Data.Vector.Mutable as MV
-- A read-only view into an MVector or similar
data Chunk s a = Chunk { size :: Int, get :: Int -> ST s a }
data Iteratee i o
= Done o
| Moar (forall s. Chunk s i -> ST s (Iteratee i o))
foo :: Int -> Iteratee Int Double
foo n = Moar (f n 0)
where
f :: Int -> Int -> Chunk s Int -> ST s (Iteratee Int Double)
f 0 v _ = return $ Done (fromIntegral v)
f c v chunk = do
let go n !c | n < 0 = return c
go n !c = do
x <- get chunk n
go (n - 1) (c + x)
x <- go (size chunk - 1) 0
return $ Moar (f (c - 1) (v + x))
bufSize = 16
run :: Iteratee i o -> [i] -> o
run i xs = runST $ do
ref <- newSTRef xs
buf <- MV.new bufSize
let chunk = Chunk bufSize (MV.read buf)
go (Done x) = return x
go (Moar f) = do
cur <- readSTRef ref
modifySTRef ref (drop bufSize)
forM_ (zip [0..bufSize - 1] cur) (\(i, x) -> MV.write buf i x)
next <- f chunk
go next
go i
@pchiusano
Copy link

If I have data EMVector i = forall s . EMVector (MVector s i), and return that as o, is there literally nothing I can do with the existential MVector s i? If I can get any information at all out of that existential MVector, then that would be unsafe...

@copumpkin
Copy link
Author

The only thing you can do to an MVector outside of ST is ask for its length, since that never changes. You could imagine a fancy implementation that gives you chunks of different sizes, so you'd be able to retain pointers to them but not be able to do much else. In particular, no assumptions would be violated. If during your chunk the length was x, you could hang onto the chunk in an existential and use it way later, and the length would still be x. Other chunk processors might see different MVectors, though.

@copumpkin
Copy link
Author

Also to clarify for any readers: the goal here isn't to have effects in ST; it's to maintain the same old pure Iteratee model people are used to (substitute for pipes or machines or whatever) while allowing the runner to safely use mutable chunks.

The MVector in the iteratee itself is merely a marker. I would only expect the producer to ever write to it, and if I were less lazy I'd write a read-only state-annotated vector for something like this.

@pchiusano
Copy link

Very nice! You could enter ST again to manipulate the existential vector, but you wouldn't be able to run the resulting ST action.

I may steal this for scalaz-stream. :) I had been playing with idea of recycling buffers, but I hadn't figured out way to make it safe.

@copumpkin
Copy link
Author

Yeah, exactly. runST is useless if you have an ST action for a particular state type. I'm trying to think now whether something like stream fusion could benefit from the idea. Feel free to do whatever you want with it! Chances are you'll do more than I will :P

@pchiusano
Copy link

I worked out another scheme that might be nicer for this - basic idea is to have the driver maintain a weak reference to the Vector it returns. When it goes to read the next value, it checks whether the weak reference points to anything. If not, it can mutate the underlying buffer. If the weak reference is not null, that (conservatively) implies that something is still holding on to the vector, which is fine, it can create a fresh buffer. It's 'conservative', in the sense that the weak reference might not get nulled out in time, but that is perfectly okay here since it's just an optimization. I also suspect with generational GC that it would end up working quite well in practice.

I kind of prefer this to hitting it with the ST sledgehammer. Using ST totally restricts aliasing / sharing. With this approach, you are free to alias / share the vector you get back, it's just that if you keep it around, the driver will be forced to make a copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment