Skip to content

Instantly share code, notes, and snippets.

@winterland1989
Last active July 1, 2017 00:19
Show Gist options
  • Select an option

  • Save winterland1989/787665e11d92da4ec2bbdd043a9af748 to your computer and use it in GitHub Desktop.

Select an option

Save winterland1989/787665e11d92da4ec2bbdd043a9af748 to your computer and use it in GitHub Desktop.
BitTwiddle with haskell
{-# LANGUAGE CPP #-}
{-# LANGUAGE MagicHash #-}
-- | This module provide fast memchr implemented with ghc primitives.
--
-- http://lemire.me/blog/2017/01/20/how-quickly-can-you-remove-spaces-from-a-string/
-- https://graphics.stanford.edu/~seander/bithacks.html
-- https://jameshfisher.github.io/2017/01/24/bitwise-check-for-zero-byte.html
--
--
module Data.Primitive.BitTwiddle where
import GHC.Prim
import GHC.Types
import Debug.Trace
-- we need to know word size
#include "MachDeps.h"
#if SIZEOF_HSWORD == 4
# define CAST_OFFSET_WORD_TO_BYTE(x) (x `uncheckedIShiftL#` 2#)
# define CAST_OFFSET_BYTE_TO_WORD(x) (x `uncheckedIShiftRA#` 2#)
#else
# define CAST_OFFSET_WORD_TO_BYTE(x) (x `uncheckedIShiftL#` 3#)
# define CAST_OFFSET_BYTE_TO_WORD(x) (x `uncheckedIShiftRA#` 3#)
#endif
isOffsetAlign# :: Int# -> Bool
{-# INLINE isOffsetAlign# #-}
isOffsetAlign# s# = isTrue# ((SIZEOF_HSWORD# -# 1#) `andI#` s# ==# 0#)
mkMask# :: Word# -> Word#
{-# NOINLINE mkMask# #-}
mkMask# w8# =
#if SIZEOF_HSWORD == 4
w8# `or#` (w8# `uncheckedShiftL#` 8#)
`or#` (w8# `uncheckedShiftL#` 16#)
`or#` (w8# `uncheckedShiftL#` 24#)
#else
w8# `or#` (w8# `uncheckedShiftL#` 8#)
`or#` (w8# `uncheckedShiftL#` 16#)
`or#` (w8# `uncheckedShiftL#` 24#)
`or#` (w8# `uncheckedShiftL#` 32#)
`or#` (w8# `uncheckedShiftL#` 40#)
`or#` (w8# `uncheckedShiftL#` 48#)
`or#` (w8# `uncheckedShiftL#` 56#)
#endif
-- https://jameshfisher.github.io/2017/01/24/bitwise-check-for-zero-byte.html
--
wordNotContainNullByte# :: Word# -> Bool
{-# INLINE wordNotContainNullByte# #-}
wordNotContainNullByte# w# =
let highbits# =
#if SIZEOF_HSWORD == 4
(w# `minusWord#` 0x01010101##) `and#` (not# w#) `and#` 0x80808080##
#else
(w# `minusWord#` 0x0101010101010101##) `and#` (not# w#) `and#` 0x8080808080808080##
#endif
in isTrue# (highbits# `eqWord#` 0##)
--
-- https://sourceware.org/viewvc/src/newlib/libc/string/memchr.c?revision=1.4&view=markup
memchr# :: ByteArray# -> Word# -> Int# -> Int# -> Int#
{-# INLINE memchr# #-}
memchr# = beforeAlignedLoop#
beforeAlignedLoop# :: ByteArray# -> Word# -> Int# -> Int# -> Int#
beforeAlignedLoop# ba# c# s# end#
| isTrue# (s# >=# end#) = -1#
| isTrue# (c# `eqWord#` indexWord8Array# ba# s#) = s#
| isOffsetAlign# s# = alignedLoop# ba# c# (mkMask# c#)
CAST_OFFSET_BYTE_TO_WORD(s#)
CAST_OFFSET_BYTE_TO_WORD(end#)
end#
| otherwise = beforeAlignedLoop# ba# c# (s# +# 1#) end#
alignedLoop# :: ByteArray# -> Word# -> Word# -> Int# -> Int# -> Int# -> Int#
alignedLoop# ba# c# mask# s# end# end_#
| isTrue# (s# >=# end#) = afterAlignedLoop# ba# c# CAST_OFFSET_WORD_TO_BYTE(s#) end_#
| otherwise = case indexWordArray# ba# s# of
w#
| wordNotContainNullByte# (mask# `xor#` w#) ->
alignedLoop# ba# c# mask# (s# +# 1#) end# end_#
| otherwise ->
afterAlignedLoop# ba# c# CAST_OFFSET_WORD_TO_BYTE(s#) end_#
afterAlignedLoop# :: ByteArray# -> Word# -> Int# -> Int# -> Int#
afterAlignedLoop# ba# c# s# end#
| isTrue# (s# >=# end#) = -1#
| isTrue# (c# `eqWord#` indexWord8Array# ba# s#) = s#
| otherwise = afterAlignedLoop# ba# c# (s# +# 1#) end#
@winterland1989
Copy link
Author

winterland1989 commented Jun 30, 2017

Benchmark result says this's far from practical usage though (WE REALLY NEED SSE IN GHC NCG PLZ):

benchmarking length 1000000/c memchr
time                 29.78 μs   (29.65 μs .. 30.01 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 29.98 μs   (29.82 μs .. 30.51 μs)
std dev              866.1 ns   (313.7 ns .. 1.874 μs)
variance introduced by outliers: 30% (moderately inflated)

benchmarking length 1000000/haskell memchr
time                 119.9 μs   (117.9 μs .. 122.4 μs)
                     0.994 R²   (0.989 R² .. 0.998 R²)
mean                 120.7 μs   (118.5 μs .. 124.2 μs)
std dev              9.325 μs   (5.705 μs .. 13.31 μs)
variance introduced by outliers: 72% (severely inflated)

benchmarking length 1000000/haskell loop without bit twiddle
time                 443.6 μs   (441.3 μs .. 447.0 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 444.3 μs   (443.0 μs .. 446.8 μs)
std dev              6.045 μs   (4.107 μs .. 8.408 μs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment