Skip to content

Instantly share code, notes, and snippets.

@davidallsopp
Last active August 29, 2015 14:09
Show Gist options
  • Select an option

  • Save davidallsopp/a73388b418d1e4bd19f9 to your computer and use it in GitHub Desktop.

Select an option

Save davidallsopp/a73388b418d1e4bd19f9 to your computer and use it in GitHub Desktop.
Listing all lines from all entries in a ZIP archive, using the zip-conduit package in Haskell
module ZipLines where
-- requires zip-conduit package
-- See examples in docs at:
-- https://hackage.haskell.org/package/zip-conduit-0.2.2.1/docs/Codec-Archive-Zip.html
-- and (rather more useful) example at:
-- http://stackoverflow.com/a/20153522/699224
import Control.Monad.IO.Class (liftIO) -- requires "transformers" package. What about mtl?
import Data.Conduit
import qualified Data.Conduit.List as CL
import qualified Data.Conduit.Text as CT
import Codec.Archive.Zip
-- Example from Stack Overflow - only processes first entry
main :: IO ()
main = withArchive "test.zip" $ do
n:_ <- entryNames -- take first entry name (n :: FilePath)
sourceEntry n sink where
sink = CT.decode CT.utf8 -- decode conduit as UTF8 text
=$ CT.lines -- lines conduit
=$ CL.mapM_ (liftIO . print) -- argh, type soup!
-- Adapted to process all entries:
main2 :: IO ()
main2 = withArchive "test.zip" printLines
printLines :: Archive () -- type Archive = StateT Zip IO
printLines = do
names <- entryNames -- get a list of FilePath. entryNames:: Archive [FilePath]
mapM_ doEntry names -- map doEntry over it, and discard the result, returning ()
-- or just:
-- printLines = entryNames >>= mapM_ doEntry
doEntry :: FilePath -> Archive ()
doEntry name = sourceEntry name sink where
sink = CT.decode CT.utf8 -- decode conduit as UTF8 text
=$ CT.lines -- lines conduit
=$ CL.mapM_ (liftIO . print)
@davidallsopp

Copy link
Copy Markdown
Author

Some explanatory types:

withArchive :: FilePath -> Archive a -> IO a

entryNames :: Archive [FilePath]

sourceEntry :: FilePath -> Sink ByteString (ResourceT Archive) a -> Archive a

sink :: Sink ByteString (ResourceT Archive) ()

type FilePath = String

type Archive = StateT Zip IO
-- and the Zip datatype is defined in Codec.Archive.Zip, and holds 
-- filepath, headers, central directory offset etc

@davidallsopp

Copy link
Copy Markdown
Author

The State monad tutorial here is good: https://en.wikibooks.org/wiki/Haskell/Understanding_monads/State

Also, note that the State data constructor no longer exists - use: state :: (s -> (a, s)) -> State s a

See https://stackoverflow.com/questions/24103108/where-is-the-data-constructor-for-state

@davidallsopp

Copy link
Copy Markdown
Author

How do we actually get an Archive ? From doEntry:, which uses sourceEntry

doEntry :: FilePath -> Archive ()

sourceEntry :: FilePath -> Sink ByteString (ResourceT Archive) a -> Archive a

mapM_ :: (FilePath -> StateT Zip IO ()) -> [FilePath] -> StateT Zip IO ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment