Skip to content

Instantly share code, notes, and snippets.

View tspannhw's full-sized avatar
💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL

Timothy Spann tspannhw

💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL
View GitHub Profile

Effective Engineer - Notes

What's an Effective Engineer?

  • They are the people who get things done. Effective Engineers produce results.

Adopt the Right Mindsets

@tspannhw
tspannhw / gist:8e185db22128205496e6dd40cf03b071
Created March 8, 2018 18:06 — forked from gwenshap/gist:11408870
generate table from Avro schema
#!/usr/bin/python
import json
import argparse
def convertType(type):
if type=="long":
return "bigint"
else:
return type
#!/bin/bash
set -e
CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ "$1" stdout -l eng | xml esc)
hex=$((cat <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
@tspannhw
tspannhw / peopleCounter.py
Created April 25, 2018 18:19 — forked from jotathebest/peopleCounter.py
Pedestrian detector that sends people counter results to Ubidots. Libraries: OpenCV, requests, imutils
from imutils.object_detection import non_max_suppression
import numpy as np
import imutils
import cv2
import requests
import time
import argparse
import time
'''
@tspannhw
tspannhw / peopleCounter.py
Created April 25, 2018 18:19 — forked from jotathebest/peopleCounter.py
Pedestrian detector that sends people counter results to Ubidots. Libraries: OpenCV, requests, imutils
from imutils.object_detection import non_max_suppression
import numpy as np
import imutils
import cv2
import requests
import time
import argparse
import time
'''
MiNiFi Config Version: 3
Flow Controller:
name: GetFile
comment: jsjlejgjkelkjgkalsjdgasetg
Core Properties:
flow controller graceful shutdown period: 10 sec
flow service write delay interval: 500 ms
administrative yield duration: 30 sec
bored yield duration: 10 millis
max concurrent threads: 1
@tspannhw
tspannhw / randomforest.py
Created July 24, 2018 16:14 — forked from kkravik/randomforest.py
Training Random Forest Model in Spark, Exporting to PMML Using JPMML-SparkML and Evaluating Using Openscoring
# Import packages
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import StringIndexer, VectorIndexer, OneHotEncoder, VectorAssembler, IndexToString
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql.functions import *
# Creating Spark SQL environment
from pyspark.sql import SparkSession, HiveContext
SparkContext.setSystemProperty("hive.metastore.uris", "thrift://nn1:9083")
@tspannhw
tspannhw / MDD.xml
Created September 15, 2018 22:15 — forked from pvillard31/MDD.xml
Template for Monitoring Driven Development in NiFi
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>8927f4c0-0160-1000-597a-ea764ccd81a7</groupId>
<name>MDD</name>
<snippet>
<connections>
<id>a2098494-cce9-3fa4-0000-000000000000</id>
<parentGroupId>a8352767-434f-3321-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@tspannhw
tspannhw / 00_README.md
Created September 21, 2018 19:25 — forked from ijokarumawak/00_README.md
NiFi example to ingest a set of files only when a complete set of files is ready.

This example flow can be used to process files with following requirements:

  • A group of files can only be processed when every files for a specific group is ready

  • Each filename has groupId (e.g. 123_456) and a type name (e.g. ab/cd/ef)

  • Example set of files for group '123_456'

    • file_123_456_ab.ex1
    • file_123_456_cd.ex1
    • file_123_456_ef.ex1
  • file_123_456.ex2

@tspannhw
tspannhw / HiveAcidQuickTest
Created October 8, 2018 15:55 — forked from rajkrrsingh/HiveAcidQuickTest
quick start guide to test ACID functionality in hive
hive> set hive.support.concurrency;
hive.support.concurrency=true
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=nonstrict
hive> set hive.txn.manager;
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive> set hive.compactor.initiator.on;
hive.compactor.initiator.on=true