Good idea to create a volume, so you can get an idea of the space consumption and how compression helps you:
maprcli volume create -name eoddata -path /user/vgonzalez/eoddata
Assuming you have installed log-synth to /opt, the following will create 10 million rows in 50 threads, with each thread producing a file:
/opt/log-synth/synth -schema eoddata.json -count $((10 * 10**6)) -format json -output /mapr/se1/user/vgonzalez/eoddata/2015-05-18 -threads 50
The generated data will be pretty much random. We want it sorted in order to ingest it. First, in Drill, let's see what's resulted from log-synth and if we can get a look at the files via the storage plugin.
use maprfs.vgonzalez;
0: jdbc:drill:> show files in `eoddata/2015-05-18`;
+-------------+--------------+---------+-----------+------------+------------+--------------+------------------------+--------------------------+
| name | isDirectory | isFile | length | owner | group | permissions | accessTime | modificationTime |
+-------------+--------------+---------+-----------+------------+------------+--------------+------------------------+--------------------------+
| synth-0032 | false | true | 29362010 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:43:20.009 |
| synth-0025 | false | true | 29361827 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:43:19.279 |
| synth-0046 | false | true | 29362013 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:44:21.021 |
| synth-0021 | false | true | 29362823 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:43:20.549 |
| synth-0009 | false | true | 29363345 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:44:21.168 |
| synth-0004 | false | true | 29363691 | vgonzalez | vgonzalez | rw-rw-r-- | 2015-05-20 16:42:36.0 | 2015-05-20 16:43:19.945 |
...
Looks good. Now let's turn this JSON data into sorted CSV files.
0: jdbc:drill:> alter session set `store.format`='csv';
0: jdbc:drill:> create table s20150518 as (select t.symbol.Symbol as symbol, t.`timestamp`, t.`open` as `open`,t.high as high, t.low as low, t.`close` as `close`, t.volume as volume from `eoddata/2015-05-18` t order by `timestamp`);
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 10000000 |
+-----------+----------------------------+
1 row selected (79.103 seconds)
In 79 seconds, we sort and write out 10 million rows.