Skip to content

Instantly share code, notes, and snippets.

@vicenteg
Last active August 29, 2015 14:22
Show Gist options
  • Save vicenteg/7e060e79603f1e7ed3b4 to your computer and use it in GitHub Desktop.
Save vicenteg/7e060e79603f1e7ed3b4 to your computer and use it in GitHub Desktop.
Can Drill query Sequencefile?

Yes, Drill can query Sequencefile, via the hive metastore. Here's how.

Copy some sample data to an MapRFS/HDFS location

hadoop fs -put /opt/mapr/hive/hive-0.13/examples/files/kv1.seq /user/vgonzalez/tmp

Create an external table in Hive, referencing the sequencefile

hive -e 'CREATE EXTERNAL TABLE src_sequencefile (key STRING, value STRING) STORED AS SEQUENCEFILE location "/user/vgonzalez/tmp/kv1.seq";'

Check you can query in hive

hive -e 'select * from src_sequencefile limit 3;'

You should see something like:

$ hive -e 'select * from src_sequencefile limit 3;'
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1501.jar!/hive-log4j.properties
OK
238     val_238
86      val_86
311     val_311
Time taken: 1.138 seconds, Fetched: 3 row(s)

Now query in in Drill

$ /opt/mapr/drill/drill-1.0.0/bin/sqlline -n vgonzalez -u jdbc:drill:
apache drill 1.0.0
"this isn't your grandfather's sql"
0: jdbc:drill:> use hive;
+-------+-----------------------------------+
|  ok   |              summary              |
+-------+-----------------------------------+
| true  | Default schema changed to [hive]  |
+-------+-----------------------------------+
1 row selected (0.213 seconds)
0: jdbc:drill:> select * from src_sequencefile limit 3;
+------+----------+
| key  |  value   |
+------+----------+
| 238  | val_238  |
| 86   | val_86   |
| 311  | val_311  |
+------+----------+
3 rows selected (0.321 seconds)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment