vicenteg’s gists

vicenteg / getBigQuerySchema.sh

Created August 22, 2017 13:04

Shell script to extract the schema from an existing BigQuery table and output the schema in a format suitable for use with `bq mk`.

	#!/bin/bash

	if [ -z $1 ]; then
	echo "Provide a BQ table spec, ideally fully qualified."
	exit 1
	fi


	if $(echo "" \| jq .); then
	bq --format json show $1 \| jq -j '[.schema.fields[] \| .name + ":" + .type] \| join(",")'

vicenteg / queries.sql

Last active October 8, 2016 09:48

Some pre-install validation tests JSONified and queried with Drill.

	0: jdbc:drill:zk=localhost:2181> -- network performance with rpctest, node to node, round robin
	0: jdbc:drill:zk=localhost:2181> select type,count(1) runs,avg(t.rateMBps) avgRateMBPS,avg(t.rpcspersecond) avgrpcspersecond from (select i.type, f.* from dfs.vr.instances_view i join dfs.vr.rpctest_1_1_view f on f.`time` = i.`time` and f.host = i.host) t group by type,`time` order by type ;
	+--------------+-------+---------------------+---------------------+
	\| type \| runs \| avgRateMBPS \| avgrpcspersecond \|
	+--------------+-------+---------------------+---------------------+
	\| d2.2xlarge \| 12 \| 273.3066660563151 \| 2085.1624857584634 \|
	\| d2.2xlarge \| 12 \| 233.72499974568686 \| 1783.183344523112 \|
	\| d2.2xlarge \| 12 \| 195.93666712443033 \| 1494.8716837565105 \|
	\| d2.2xlarge \| 12 \| 239.15333557128906 \| 1824.5950215657551 \|
	\| d2.2xlarge \| 12 \| 275.3549982706706 \| 2100.788319905599 \|

vicenteg / README.md

Created September 27, 2016 21:53

Install unixODBC-devel

yum -y install unixODBC-devel

Install python-virtualenv

yum -y install python-virtualenv

vicenteg / zeppelin_mapr_streams.json

Last active September 21, 2016 19:12

	{
	"paragraphs": [
	{
	"text": "%md\n\nFirst thing you need to do to run this notebook is to make sure Zeppelin knows how to pull in MapR Streams maven dependencies.\n\nAdd this artifact to the Spark interpreter (update the version as needed):\n\n`org.apache.kafka:kafka-clients:0.9.0.0-mapr-1607`\n\nAlso make sure that the MapR repository is set up:\n\nhttp://repository.mapr.com/maven/\n\nFollow the [Zeppelin documentation for dependency management](http://zeppelin.apache.org/docs/0.6.1/manual/dependencymanagement.html) for instructions on to do this.\n",
	"dateUpdated": "2016-09-20T20:53:52-0700",
	"config": {
	"colWidth": 12,
	"graph": {
	"mode": "table",
	"height": 300,

vicenteg / configure_mapr_core_site_xml_for_s3.yml

Last active August 15, 2016 14:03

	# Requires hadoop_properties: https://github.com/vicenteg/ansible-library
	# Clone the repo to a library directory along side this playbook
	# e.g.,
	# mkdir mapr_to_s3 && cd mapr_to_s3 &&\
	# curl -L 'https://gist.githubusercontent.com/vicenteg/1b110cfd467d64487a16385ec10bdb42/raw/f20770712d90696e817cb8725181dcbb5c146020/configure_mapr_core_site_xml_for_s3.yml' -o configure_mapr_core_site_xml_for_s3.yml &&\
	# git clone https://github.com/vicenteg/ansible-library.git library
	#
	# Add your access key and secret key.
	#
	# You may need to change the group to match whatever you've named the

vicenteg / console_spinner.py

Created July 26, 2016 16:18

	import itertools
	import sys

	c = itertools.cycle(['\|','/','-','\\'])
	for i in c:
	sys.stdout.write(i)
	sys.stdout.flush()
	time.sleep(.05)
	sys.stdout.write('\r')

vicenteg / mapr-5.0.0-drill-1.2.0-cluster.md

Last active January 6, 2016 18:28

repo manifest for MapR v5.0.0 with Drill 1.2.0

A repo manifest file for a MapR cluster with Drill 1.2.0.

This Drill deployment can also work with Hive 1.2.0.

Quick Start

I assume you have successfully installed ansible (I use version 1.9) and have installed the dependencies for the ec2 modules. Specifically, you should install the boto python module (pip install boto) and awscli (pip install awscli). Then do awscli configure to store your EC2 credentials.

Step 0

vicenteg / benchmark-commands.sh

Last active April 24, 2017 16:41 — forked from jkreps/benchmark-commands.txt

Kafka Benchmark Commands

	export zookeepers=$(maprcli node listzookeepers -noheader)
	export bootstrap_servers=$(maprcli node list -columns hostname -noheader -filter csvc==kafka \| awk '{ print $1 }' \| head -1)

	# Producer

	# Setup
	bin/kafka-topics.sh --zookeeper $zookeepers --create --topic test-rep-one --partitions 6 --replication-factor 1
	bin/kafka-topics.sh --zookeeper $zookeepers --create --topic test --partitions 6 --replication-factor 3

	# Single thread, no replication

vicenteg / leave_cluster.sh

Created September 25, 2015 03:53

	#!/bin/bash

	if maprcli node list -columns id; then
	NODEID=$(maprcli node list -columns id -filter hostname==`hostname -f` -noheader \| cut -f 1 -d ' ')
	NODEVOLUMES=$(maprcli volume list -columns volumename \| egrep "^mapr.`hostname -f`")

	for volume in $NODEVOLUMES; do
	maprcli volume remove -name $volume
	done

vicenteg / env.sh

Created July 3, 2015 14:43

	#!/bin/bash
	# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
	# Please set all environment variable you want to be used during MapR cluster
	# runtime here.
	# namely MAPR_HOME, JAVA_HOME, MAPR_SUBNETS

	#set JAVA_HOME to override default search
	#export JAVA_HOME=
	export MAPR_SUBNETS=
	#export MAPR_HOME=

	0: jdbc:drill:zk=localhost:2181> -- network performance with rpctest, node to node, round robin
	0: jdbc:drill:zk=localhost:2181> select type,count(1) runs,avg(t.rateMBps) avgRateMBPS,avg(t.rpcspersecond) avgrpcspersecond from (select i.type, f.* from dfs.vr.instances_view i join dfs.vr.rpctest_1_1_view f on f.`time` = i.`time` and f.host = i.host) t group by type,`time` order by type ;
	+--------------+-------+---------------------+---------------------+
	\| type \| runs \| avgRateMBPS \| avgrpcspersecond \|
	+--------------+-------+---------------------+---------------------+
	\| d2.2xlarge \| 12 \| 273.3066660563151 \| 2085.1624857584634 \|
	\| d2.2xlarge \| 12 \| 233.72499974568686 \| 1783.183344523112 \|
	\| d2.2xlarge \| 12 \| 195.93666712443033 \| 1494.8716837565105 \|
	\| d2.2xlarge \| 12 \| 239.15333557128906 \| 1824.5950215657551 \|
	\| d2.2xlarge \| 12 \| 275.3549982706706 \| 2100.788319905599 \|

Vince Gonzalez vicenteg

A repo manifest file for a MapR cluster with Drill 1.2.0.

Quick Start

Step 0