Created
November 25, 2013 08:58
-
-
Save bohdanszymanik/7638476 to your computer and use it in GitHub Desktop.
Work mate's (@LukeGumbley) suggestion - read log files in compressed, very useful for large log files - big data stuff. The reduction in IO makes parsing lines substantially quicker. The test.zip file used for the code below held 100 log files totalling 500MB raw and 16MB zipped and all lines were processed in a just a few seconds on a laptop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#r "System.IO" | |
#r "System.IO.Compression" | |
#r "System.IO.Compression.FileSystem" | |
open System.IO | |
open System.IO.Compression | |
// open up streams from there | |
for entry in Compression.ZipFile.OpenRead(@"c:/temp/test.zip").Entries do | |
printfn "%s" entry.FullName | |
let logLines = seq { | |
use reader = new StreamReader (entry.Open() ) | |
while not reader.EndOfStream do | |
yield reader.ReadLine() | |
} | |
printfn "%i" (logLines |> Seq.length) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment