Created
June 15, 2014 15:43
-
-
Save michaelt/88e1fac12876857deefe to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module Main where | |
import Prelude hiding (lines) | |
import Lens.Family | |
import Pipes | |
import Pipes.Group | |
import Pipes.HTTP | |
import Pipes.Text | |
import Pipes.Text.Encoding | |
import Pipes.Text.IO (toHandle,stdout) | |
import qualified System.IO as IO | |
import Data.Functor (void) | |
main = do | |
req <- parseUrl "http://www.example.com" | |
-- "http://www.gutenberg.org/files/10/10-h/10-h.htm" | |
withManager tlsManagerSettings $ \m -> | |
withHTTP req m $ \resp -> void $ runEffect $ | |
numberLines (responseBody resp ^. utf8 . lines) >-> toHandle IO.stdout | |
numberLines :: Monad m => FreeT (Producer Text m) m bad -> Producer Text m bad | |
numberLines = number_loop (1 :: Int) where | |
number_loop n freeProducers = do | |
freeProducer <- lift $ runFreeT freeProducers | |
case freeProducer of | |
Pure badbytes -> do pack' "\n" | |
return badbytes -- these could be inspect with e.g. | |
Free p -> do pack' ("\n" ++ show n ++ " ") | |
nextFreeProducers <- p | |
number_loop (n+1) nextFreeProducers | |
pack' str = yield str >-> pack | |
-- Pipes.Text.pack should probably be String -> Producer Text m () | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The one defect in the program was the use of
free2list
/concats
, and then numbering by piping toprintAllResults
. What you end up numbering then are the raw text chunks. Thus when I run your program I seebut with the one above
(The difference is more obvious in the number count if you ask for the KJV as in the commented url.)
This is because there is a break in the bytes delivered by the surrounding http machinery; it is between "or " and "asking". So you end up giving a new number beginning with "asking".
Edit: the new version you have put up evades this with the short www.examples.com page and my setup; but with the Project Gutenberg text the Apocalypse reads like so:
because there is a byte break between "a" and " rainbow"
An immensely long line would be broken into several
ByteStrings
, whichdecodeUtf8 b
orb ^. utf8
would translate into severalTexts
. The program would number lines perfectly ifresponseBody resp
happened to deliver oneByteString
for the whole file.Since you are pattern matching on the
FreeT
/FreeF
constructors, I do the line-numbering directly this way. When I scrutinize theFreeT
and come upon aFree
constructor, i.e. a new line of text (which may be produced in several chunks), I prefix the number and loop with the next number.I then feed everything to the
Pipes.Text.IO
operations, for no reason, but note that I usetoHandle
which (mirroringpipes-bytestring
) allows me to keep any return value -- in this case a possible producer of bad bytes. If I had usedstdout
, myText
producer would need to return()
, which as you saw is part of what tripped up the OP. I would have to get rid of any possible bad bytes first, so the position ofvoid
would be inside the scope ofrunEffect
: