Skip to content

Instantly share code, notes, and snippets.

View cjmatta's full-sized avatar

Christopher Matta cjmatta

View GitHub Profile
@cjmatta
cjmatta / spark-notebook_mapr.md
Last active December 22, 2015 02:06
Building and using Spark Notebook for MapR

##Building and using Spark Notebook for MapR

The spark-notebook is a useful browser-based REPL that can be used to explore data and build visualizations. This guide will illustrate the MapR-specific requirements for building, and using the spark notebook on MapR clusters.

###Building Checkout the source:

$ git clone https://github.com/andypetrella/spark-notebook.git
@cjmatta
cjmatta / install_maven.sh
Created October 19, 2015 15:23
Install Maven 1-liner
sudo mkdir /usr/local/apache-maven && curl http://mirror.metrocast.net/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz | sudo tar zxvf - --strip 1 -C /usr/local/apache-maven && export M2_HOME=/usr/local/apache-maven && sudo ln -s /usr/local/apache-maven/bin/mvn /usr/local/bin/mvn
@cjmatta
cjmatta / drill_odbc.py
Created July 20, 2015 16:27
Drill with Python ODBC
import pyodbc
import re
# make sure the Drill ODBC driver is installed
# this is for Mac
MY_DSN = """
Driver = /opt/mapr/drillodbc/lib/universal/libmaprdrillodbc.dylib
ConnectionType = Zookeeper
ZKQuorum = node10:5181,node11:5181,node12:5181
ZKClusterID = se1-drillbits
@cjmatta
cjmatta / Reddit_with_drill.md
Last active August 29, 2015 14:22
Explore Reddit with Drill
@cjmatta
cjmatta / README.md
Last active May 2, 2016 16:31
Weather Exploration with Drill

Getting weather data

Use this python script to download weather data: https://github.com/cjmatta/wundergroundloader

Drill config

Ensure that the csv type for drill has "extractHeader": true, set in the filesystem plugin, and that you haven't used the --strip-headers option in the wundergroundloader script.

Create View

CREATE OR REPLACE VIEW maprfs.cmatta.`weather_view` AS
SELECT`dir0` AS `airportcode`,
@cjmatta
cjmatta / Philly_Crime_Drill.md
Last active August 29, 2015 14:22
Drill Demo - Philly Crime JSON
@cjmatta
cjmatta / keybase.md
Last active August 29, 2015 14:16
keybase.md

Keybase proof

I hereby claim:

  • I am cjmatta on github.
  • I am cmatta (https://keybase.io/cmatta) on keybase.
  • I have a public key whose fingerprint is C469 2213 66F0 371D CD16 0AE2 9885 1CEF B447 06DB

To claim this, I am signing this object:

@cjmatta
cjmatta / basic_drill_demo.md
Last active August 29, 2015 14:14
Basic Drill Demo

Drill Demo - MapR

Retail data

Show logs flat JSON:

head -n 5 /mapr/*/data/flat/logs/2012/1/log.json

Show how it can be queried:

select * from mfs.flat.`logs` limit 10;
@cjmatta
cjmatta / run_drill_query.sh
Created November 19, 2014 02:54
A bash script for running a query against a drill cluster and then collecting the updated logs from the whole cluster.
#!/bin/bash
set -o nounset
set -o errexit
if [[ ! -f $1 ]];
then
echo "Query file ${1} not found, exiting.";
exit 1;
fi
@cjmatta
cjmatta / configureOozie_mapr.sh
Last active August 29, 2015 14:08
configureOozie_mapr.sh
#!/bin/bash
# Copyright (C) 2013 by Teradata Corporation.
# All Rights Reserved.
#
# This script installs tdch for Oozie transfers with Hadoop
#
# Version : $Id$
# MapR Notes
# Since MapR doesn't need a nameNode, we've removed it from this script