-
-
Save yfyf/6639162 to your computer and use it in GitHub Desktop.
| /* | |
| * Byte calculator. Sums bytes of a given file and displays to screen. | |
| * | |
| * Compile: | |
| * gcc ccalc.c -o ccalc -O2 | |
| * | |
| */ | |
| #include <stdlib.h> | |
| #include <stdio.h> | |
| #include <stdint.h> | |
| #define BUFFER (1 << 20) | |
| unsigned long long do_calc(FILE *f) { | |
| uint8_t buf[BUFFER]; | |
| unsigned long long sum = 0; | |
| size_t nbytes, i; | |
| while ((nbytes = fread(buf, 1, BUFFER, f)) > 0) | |
| for (i = 0; i < nbytes; i++) | |
| sum += buf[i]; | |
| return sum; | |
| } | |
| int main(int argc, const char* argv[]) { | |
| FILE *f; | |
| unsigned long long res; | |
| if (argc != 2) | |
| return 1; | |
| if ((f = fopen(argv[1], "rb")) == NULL) { | |
| perror("fopen"); | |
| return 1; | |
| }; | |
| res = do_calc(f); | |
| fclose(f); | |
| printf("%llu\n", res); | |
| return 0; | |
| } |
| {- Compile: ghc --make -O2 hcalc.hs -} | |
| {-# LANGUAGE BangPatterns #-} | |
| module Main where | |
| import System.IO | |
| import Data.Word | |
| import System.Environment | |
| import qualified Data.ByteString as B | |
| buffer = 1024 | |
| toInt :: Word8 -> Int | |
| toInt = fromIntegral | |
| sum_bytes :: Handle -> IO Int | |
| sum_bytes fh = sum_bytes' fh 0 | |
| sum_bytes' :: Handle -> Int -> IO Int | |
| sum_bytes' fh !acc = do | |
| res <- B.hGet fh buffer | |
| if B.null res | |
| then return acc | |
| else sum_bytes' fh (B.foldl' (\acc e -> (toInt e) + acc) acc res) | |
| main = do | |
| args <- getArgs | |
| let filepath = head args | |
| fh <- openBinaryFile filepath ReadMode | |
| val <- sum_bytes fh | |
| hClose fh | |
| print val |
Might have to do with the kernel page size and allocation of the buffer. Using long long in C might not be optimal for the part where summation does not overflow long (but maybe modern processors have arithmetic support for long long). You can always intercept the C code GHC generates and optimize there to have a C program that is guaranteed to be faster than the Haskell version. Parallelization (divide and conquer) is possible when running Raid and multi-processor. Other compiler optimization choices might be beneficial too. Lastly, this might be faster when implemented in the block subsystem of the kernel ;-).
You can always intercept the C code GHC generates and optimize there to have a C program that is guaranteed to be faster than the Haskell version.
Haha, well, sort of true, that's why it says "pointless" in the title. Though, the compiled Haskell C code will depend on the GHC RTS. The GHC RTS is written in C and C--. So you would have to re-do the C-- part in C too!
The fasterness is due to smaller buffer size though and that might be dependent to my system. If I make the
ccalc.cbuffer size also1024, then the execution times are pretty much identical. This is kind of weird though since I would expect a bigger buffer to be faster.