Skip to content

Instantly share code, notes, and snippets.

@rebolek
Last active September 17, 2019 10:16
Show Gist options
  • Save rebolek/e9c718175a0c60c1ec1a6e1a97d8cd2c to your computer and use it in GitHub Desktop.
Save rebolek/e9c718175a0c60c1ec1a6e1a97d8cd2c to your computer and use it in GitHub Desktop.

Every executable is compiled 5 times, extreme values are thrown away and compile time is average of three times.

Encap versions are wrapped in do [] block.

Source code size of CSV codec is 10181 bytes for full version and 4988 bytes for lite version (~49%).

Lite version supports block of records only as Red format, full version suppoorts block of maps and map of columns also, plus some additional features like header handling.

name compile time (ms) % of original compile time difference (ms) size (bytes) % of original size difference (bytes)
nocsv 31791.67 100% 0 1116924 100% 0
csv (master) 34828.67 109.55% 3037 1158352 103.71% 41428
csv-encap 35373.0 111.26% 3581.33 1136208 101.73% 19284
csv-lite 33141.33 104.25% 1349.67 1136512 101.75% 19588
csv-line-encap 32980.67 103.74% 1189 1126532 100.86% 9308

Implementation speed

Speed is tested on block of 343 records, each with 343 values (columns), each value is 343 bytes.

name load-csv to-csv
csv (master) 0.675 4.459
csv-encap 0.544 4.234
csv-lite 0.626 4.129
csv-lite-encap 0.573 4.312

Compiling has no noticable effect on speed, actually encapped load-csv seems to be bit (10-20%) faster, which is interesting. As expected, using lite version has no impact on speed.o

Storage speed

CSV codec supports four different storage methods:

  • block of blocks
  • flat block
  • block of maps
  • map of columns

See following tables how different methods compare in terms of speed and memory usage. Each test was done for three different data sources:

  • wide table: 1000 columns, 50 rows, each record has 100 bytes
  • tall table: 50 columns, 1000 rows, each record has 100 bytes
  • huge table: 500 columns, 500 rows, each record has 100 bytes

wide table

Code Time (sec) Memory (bytes) Time difference Memory difference
load-csv 0.162 46'389'348 100% 100%
to-csv 0.645 51'443'740 100% 100%
load-csv/flat 0.161 46'383'544 99.38% 99.99%
to-csv/skip 0.647 53'837'040 100.31% 104.65%
load-csv/as-columns 0.185 49'444'008 114.20% 106.58%
to-csv (columns) 0.688 56'936'872 106.67% 110.68%
load-csv/as-records 0.193 64'854'576 119.14% 139.80%
load-csv (records) 0.748 77'702'712 115.97% 151.04%

tall table

Code Time (sec) Memory (bytes) Time difference Memory difference
load-csv 0.165 46'722'984 100% 100%
to-csv 0.667 49'366'684 100% 100%
load-csv/flat 0.165 46'722'832 100% 100%
to-csv/skip 0.672 51'652'420 100.75% 104.63%
load-csv/as-columns 0.189 49'160'020 114.54% 105.22%
to-csv (columns) .704 56'862'556 105.55% 114.58%
load-csv/as-records .210 67'397'440 127.27% 144.25%
to-csv (records) .730 70'747'636 109.45% 143.31%

huge table

Code Time (sec) Memory (bytes) Time difference Memory difference
load-csv 0.712 231'456'924 100% 100%
to-csv 3.094 237'046'516 100% 100%
load-csv/flat 0.728 231'456'772 102.25% 100.00%
to-csv/skip 3.328 248'988'520 107.56% 105.04%
load-csv/as-columns 0.873 243'786'988 122.61% 105.33%
to-csv (columns) 3.366 262'646'188 108.80% 110.80%
load-csv/as-records 0.861 323'046'568 120.93% 139.57%
to-csv (records) 4.487 366'363'296 145.02% 154.55%

As you can see, the basic format, block of blocks is the most efficient in terms of both speed and memory usage, together with flat block. Conversion speed of map of columns to CSV is almost as fast, being only about 5% slower. Loading CSV to columns is 15-20% slower, memory is just about 5% higher than for blocks. The slowest and most memory hungry option is block of maps (records). Loading from CSV can take 10-45% time more, conversion to CSV around 20%. Memory usage is between 40-55% higher. OTOH this format is most user friendly.

@lucindamichele
Copy link

This is awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment