Skip to content

Instantly share code, notes, and snippets.

@snt
Created July 5, 2013 07:25
Show Gist options
  • Save snt/5932635 to your computer and use it in GitHub Desktop.
Save snt/5932635 to your computer and use it in GitHub Desktop.
gaps larger than 3 seconds of the log in the format: yyyy-mm-dd HH:MM:SS.sss/JST (blah, blah, blah) something goes wrong findgap.py runs in 30 seconds for 612000 lines of log findgap.hs does in 7 minutes... what's wrong? As far as I know (by ghc -prof), regex part is its bottleneck.
import Text.Regex.TDFA
import Data.Time.Format
import Data.Time.Clock
import System.Locale
import Data.Maybe
import qualified Data.ByteString.Lazy.Char8 as B
parseTimestamp :: B.ByteString -> UTCTime
parseTimestamp s = fromJust $ parseTime defaultTimeLocale "%F %T%Q/%Z" (B.unpack s)
data TsMsg = TsMsg {ts::UTCTime, msg::B.ByteString} deriving Show
tsAndMessage :: B.ByteString -> TsMsg
tsAndMessage bs = TsMsg (parseTimestamp t) m
where
[[_, t, m]] = bs =~ "^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]{3}/...) .*\\)(.*)$" :: [[B.ByteString]]
tsAndMessages :: [B.ByteString] -> [TsMsg]
tsAndMessages blines = map tsAndMessage blines
pairedLines :: NominalDiffTime -> [B.ByteString] -> [(TsMsg,TsMsg)]
pairedLines secs lines = filter (\ (t1,t2) -> (diffTs (ts t2) (ts t1)) > secs) pairs
where
zs = tsAndMessages lines
pairs = zip zs (tail zs)
diffTs :: UTCTime -> UTCTime -> NominalDiffTime
diffTs ts1 ts2 = diffUTCTime ts1 ts2
show_gap :: (TsMsg, TsMsg) -> B.ByteString
show_gap (tsm1, tsm2) =
(B.pack "----------\n")
`B.append` (B.pack $ show $ (diffTs (ts tsm2) (ts tsm1))) `B.append` (B.pack "\n")
`B.append` (B.pack $ show $ ts tsm1) `B.append` (B.pack " ")
`B.append` (msg tsm1) `B.append` (B.pack "\n")
`B.append` (B.pack $ show $ ts tsm2) `B.append` (B.pack " ")
`B.append` (msg tsm2 )
main =
B.interact
$ B.unlines
. (map show_gap)
. (pairedLines secs)
. B.lines
where secs = 3
#!/usr/bin/python
from datetime import *
import fileinput
import re
p = re.compile('^(\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d\d\d)/.*\)(.*)$')
timefmt='%Y-%m-%d %H:%M:%S.%f'
lines = fileinput.input();
line = lines[0];
m = p.match(line)
pre_dt = datetime.strptime(m.group(1),timefmt)
pre_msg= m.group(2)
for line in lines:
m = p.match(line)
dt = datetime.strptime(m.group(1),timefmt)
msg = m.group(2)
if (dt - pre_dt) > timedelta(0,3):
print "-----"
print "delta ", dt - pre_dt
print pre_dt, pre_msg
print dt, msg
pre_dt = dt
pre_msg = msg
@snt
Copy link
Author

snt commented Jul 17, 2013

By adding

import Control.Parallel

and compile it with -threaded option

ghc -O2 -threaded --make findgap.hs

makes Haskell version run in 3 minutes ( 2x faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment