Skip to content

Instantly share code, notes, and snippets.

View tspannhw's full-sized avatar
💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL

Timothy Spann tspannhw

💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL
View GitHub Profile
@tspannhw
tspannhw / Sql on Hadoop.md
Created May 10, 2016 14:55 — forked from abajwa-hw/Sql on Hadoop.md
Sql on Hadoop workshop

LAB

This lab is part of a 'Sql on Hadoop' webinar. The recording and slides can be found here

Purpose

How/when to use Hive vs Phoenix vs SparkSQL

@tspannhw
tspannhw / hbase-indexing-solr.md
Created May 15, 2016 12:57 — forked from abajwa-hw/hbase-indexing-solr.md
Hbase indexing to solr in HDP 2.3

Hbase indexing to solr in HDP 2.3

  • Background:

The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.

Phoenix/Spark demo

Option 1: prebuilt VM

There is a prebuilt Centos 6.5 VM with the below components installed:

  • HDP 2.3.0.0-1754
  • Spark 1.3.1
#Install R and then following packages
#repr failed to create
yum install R-*
install.packages("evaluate", dependencies = TRUE)
install.packages("base64enc", dependencies = TRUE)
install.packages("devtools", dependencies = TRUE)
install_github('IRkernel/repr')
install.packages("dplyr", dependencies = TRUE)
install.packages("caret", dependencies = TRUE)
install.packages("repr", dependencies = TRUE)

Future of Big Data: Philadelphia

These are notes for following along on the talk I am giving.

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone!

  1. This gist is using the latest version of Zeppelin. Replace the ip address 192.168.99.100 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via: docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
  3. If it doesnt' work, go back to the specific "stable" version of Zeppelin. There is a 1 GB layer in there, watch out!
@tspannhw
tspannhw / index.html
Last active June 10, 2016 13:09 — forked from darwin/index.html
HDF: Can automate all the things
<!DOCTYPE html>
<meta charset="utf-8">
<link rel="stylesheet" href="http://cmx.io/v/0.1/cmx.css">
<script src="http://cmx.io/v/0.1/cmx.js" charset="utf-8"></script>
<style>.cmx-user-scene4 .cmx-text-border .cmx-path {stroke: orange}</style>
<body>
<div style="max-width:900px; -webkit-transform:rotate(0deg)">
<scene id="scene1">
<label t="translate(0,346)">
import java.nio.charset.StandardCharsets
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.StreamCallback
import groovy.transform.ToString
import com.vader.SentimentAnalyzer
import groovy.json.JsonBuilder
import groovy.json.JsonOutput
import groovy.json.JsonSlurper
@tspannhw
tspannhw / config.yml
Created August 19, 2016 15:06 — forked from JPercivall/config.yml
Demo config.yml used for MiNiFi 0.0.1 MeetUp talks
Flow Controller:
name: MiNiFi S2S temp and humidity
comment: ''
Core Properties:
flow controller graceful shutdown period: 10 sec
flow service write delay interval: 500 ms
administrative yield duration: 30 sec
bored yield duration: 10 millis
max concurrent threads: 1
FlowFile Repository:
@tspannhw
tspannhw / TestInvokeHttpPOST.xml
Created August 22, 2016 20:51 — forked from mattyb149/TestInvokeHttpPOST.xml
A NiFi template to test InvokeHttp's POST capability, sends JSON to a test endpoint and writes the response to a file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><template><description>This template posts JSON to a test endpoint (http://httpbin.org/post) and writes the response to a file</description><name>TestInvokeHttpPOST</name><snippet><connections><id>db33e68a-c19f-4059-944e-56229ccc550b</id><parentGroupId>60dbaa71-2453-4f82-bafe-6087462ecc47</parentGroupId><backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold><backPressureObjectThreshold>0</backPressureObjectThreshold><destination><groupId>60dbaa71-2453-4f82-bafe-6087462ecc47</groupId><id>fe102472-7dbc-4719-9ef1-266992646565</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>success</selectedRelationships><source><groupId>60dbaa71-2453-4f82-bafe-6087462ecc47</groupId><id>46869aaa-0232-4a8b-9f53-7bd221fdf8b8</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>d1be0a90-92b0-4320-956f-decd9df513b1</id><parentGroupId>60dba
@tspannhw
tspannhw / southwest.json
Created October 7, 2016 17:08 — forked from mapmeld/southwest.json
Southwest Airlines free in-flight tracker data at http://getconnected.southwestwifi.com/current.json
{"wth_fc1_icon": "", "orig_code": "HOU", "headingAbbrStr": "W", "altitude_f": "38 002 ft", "altitude_m": "11 583 m", "wth_fc2_lowf": "61", "wth_fc3_condition": "Sunny", "wth_fc4_condition": "Sunny", "image_white": "night_partly_cloudy_white.png", "wth_fc3_dow": "Mon", "gspeed_m": "457 mph", "wth_tempf": "64", "ttg": "223", "lochist": [["29.645419", "-95.278889"], ["32.573", "-108.072"], ["33.176", "-111.203"], ["33.222", "-111.456"], ["33.279", "-111.772"], ["33.502", "-112.582"], ["33.634", "-113.021"], ["33.689", "-113.206"], ["33.708", "-113.268"], ["33.743", "-113.391"], ["33.780", "-113.515"], ["33.798", "-113.577"], ["33.817", "-113.640"], ["33.852", "-113.764"], ["33.870", "-113.827"], ["33.907", "-113.953"], ["33.943", "-114.079"], ["33.961", "-114.142"], ["33.979", "-114.204"], ["33.997", "-114.268"], ["34.015", "-114.330"], ["34.032", "-114.393"], ["34.050", "-114.456"], ["34.068", "-114.520"]], "gspeed_k": "736 km/h", "wth_fc2_highc": "25", "dest_code": "LAX", "wth_fc4_highf": "94", "wth_fc2_highf"