-
-
Save yfyf/6639162 to your computer and use it in GitHub Desktop.
/* | |
* Byte calculator. Sums bytes of a given file and displays to screen. | |
* | |
* Compile: | |
* gcc ccalc.c -o ccalc -O2 | |
* | |
*/ | |
#include <stdlib.h> | |
#include <stdio.h> | |
#include <stdint.h> | |
#define BUFFER (1 << 20) | |
unsigned long long do_calc(FILE *f) { | |
uint8_t buf[BUFFER]; | |
unsigned long long sum = 0; | |
size_t nbytes, i; | |
while ((nbytes = fread(buf, 1, BUFFER, f)) > 0) | |
for (i = 0; i < nbytes; i++) | |
sum += buf[i]; | |
return sum; | |
} | |
int main(int argc, const char* argv[]) { | |
FILE *f; | |
unsigned long long res; | |
if (argc != 2) | |
return 1; | |
if ((f = fopen(argv[1], "rb")) == NULL) { | |
perror("fopen"); | |
return 1; | |
}; | |
res = do_calc(f); | |
fclose(f); | |
printf("%llu\n", res); | |
return 0; | |
} |
{- Compile: ghc --make -O2 hcalc.hs -} | |
{-# LANGUAGE BangPatterns #-} | |
module Main where | |
import System.IO | |
import Data.Word | |
import System.Environment | |
import qualified Data.ByteString as B | |
buffer = 1024 | |
toInt :: Word8 -> Int | |
toInt = fromIntegral | |
sum_bytes :: Handle -> IO Int | |
sum_bytes fh = sum_bytes' fh 0 | |
sum_bytes' :: Handle -> Int -> IO Int | |
sum_bytes' fh !acc = do | |
res <- B.hGet fh buffer | |
if B.null res | |
then return acc | |
else sum_bytes' fh (B.foldl' (\acc e -> (toInt e) + acc) acc res) | |
main = do | |
args <- getArgs | |
let filepath = head args | |
fh <- openBinaryFile filepath ReadMode | |
val <- sum_bytes fh | |
hClose fh | |
print val |
Aaaand Haskell is now faster than C:
[~/sandbox/calc:master]λ time ./hcalc big_file
599364006345
real 0m13.161s
user 0m7.390s
sys 0m1.703s
The fasterness is due to smaller buffer size though and that might be dependent to my system. If I make the ccalc.c
buffer size also 1024
, then the execution times are pretty much identical. This is kind of weird though since I would expect a bigger buffer to be faster.
Might have to do with the kernel page size and allocation of the buffer. Using long long
in C might not be optimal for the part where summation does not overflow long
(but maybe modern processors have arithmetic support for long long
). You can always intercept the C code GHC generates and optimize there to have a C program that is guaranteed to be faster than the Haskell version. Parallelization (divide and conquer) is possible when running Raid and multi-processor. Other compiler optimization choices might be beneficial too. Lastly, this might be faster when implemented in the block subsystem of the kernel ;-).
You can always intercept the C code GHC generates and optimize there to have a C program that is guaranteed to be faster than the Haskell version.
Haha, well, sort of true, that's why it says "pointless" in the title. Though, the compiled Haskell C code will depend on the GHC RTS. The GHC RTS is written in C and C--. So you would have to re-do the C-- part in C too!
Benchmarks!