Skip to content

Instantly share code, notes, and snippets.

@cjmatta
Last active August 29, 2015 14:22
Show Gist options
  • Save cjmatta/33b189d5ff6b26f4c8dd to your computer and use it in GitHub Desktop.
Save cjmatta/33b189d5ff6b26f4c8dd to your computer and use it in GitHub Desktop.
Explore Reddit with Drill

Download json:

http://reddit.com/r/all.json?sort=hot&limit=100

Save to directories named after unix timestamp.

Create View

CREATE OR REPLACE VIEW reddit_view AS SELECT 
    `id`,
    CAST(`to_timestamp`(CAST(`dir0` AS BIGINT)) AS TIMESTAMP) AS `collected`,
    CAST(`to_timestamp`(CAST(CAST(`created` AS FLOAT) AS BIGINT)) AS TIMESTAMP) AS `posted`,
    (CAST(`dir0` AS BIGINT) - CAST(CAST(`created_utc` AS FLOAT) AS BIGINT)) / 60 AS `age_mins`,
    `subreddit`,
    `title`,
    `url`,
    `domain`,
    CAST(`rank` AS INTEGER) AS `rank`,
    CAST(`score` AS BIGINT) AS `score`,
    CAST(`ups` as BIGINT) as `ups`,
    CAST(`downs` as BIGINT) as `downs`,
    CAST(`num_comments` AS INTEGER) AS `num_comments`,
    CAST(`is_self` AS BOOLEAN) AS `is_self`
FROM `maprfs`.`cmatta`.`reddit/data`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment