John Benninghoff jbenninghoff

How to generate synthetic data in Hive table format

CSV and HQL Generation

Use the included genHiveTableFromSchema.py Python script to generate the structured CSV data and the associated Hive script locally. The script requires options to specify the schema file, row count, and partition sizes.

Keycloak config

Download OpenJDK version from keycloak.org
Unpack
Test
1. ./bin/standalone.sh
2. nc -w1 -i1 -v localhost 8080
Config for systemd
1. create keycloak.service (see sample at end)
sudo cp keycloak.service /usr/lib/systemd/system/

EMR with Hue/Presto/TLS/SAML

This package of shell scripts automates the install and configuration of EMR with Hue, Presto, TLS and SAML.

The main script uses AWS CLI to install EMR, Hue, and Presto. It drives the other 4 scripts
- emr-install-krb-presto-tls.sh
The actions needed to configure Presto, Kerberos and TLS are in the first bootstrap script
- presto-kerberos-tls.sh

	john.benninghoff@LLQ1K7Y1Q9 2: brew leaves \|column -c 180 \|column -t ~
	ansible ddgr grep mysql-client sysbench
	apache-spark dict groovy neovim thrift
	automake docutils gzip nmap tmux
	awscli eksctl ioping p7zip trash
	bash esolitos/ipa/sshpass iozone pandoc tree
	bash-completion findutils ipcalc parquet-cli unzip
	berkeley-db fio iperf3 pkg-config vim
	bison fortune jemalloc poetry wget
	black gawk jq pylint wiki


	Hadoop job: job_1681245476823_0001
	=====================================
	User: hadoop
	JobName: XmlExtraction
	JobConf: hdfs://ip-10-0-2-30.us-west-2.compute.internal:8020/tmp/hadoop-yarn/staging/hadoop/.staging/job_1681245476823_0001/job.xml
	Submitted At: 11-Apr-2023 20:40:08
	Launched At: 11-Apr-2023 20:40:14 (6sec)
	Finished At: 12-Apr-2023 02:54:52 (6hrs, 14mins, 37sec)
	Status: SUCCEEDED

	#!/usr/bin/env bash
	# jbenninghoff@ 2023-Mar-24
	# Script to run XML extraction job from cron
	# Alternatetively use Step Functions instead of cron:
	# https://docs.aws.amazon.com/en_us/step-functions/latest/dg/sample-emr-job.html
	# Or use AWS Data Pipeline:
	# https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-recurring.html
	#set -o nounset; set -o errexit; set -o pipefail
	set -o errexit; set -o pipefail

	#!/bin/bash
	#
	# Use this to capture output into var using read pipeline
	set +m; shopt -s lastpipe

	# Copy JARs and XML locally, files needed as args to job launch
	aws s3 cp s3://jobennin-emr-data/hp-mapr/java_extraction_byteswritable.jar .
	aws s3 cp s3://jobennin-emr-data/hp-mapr/configint.xml .
	aws s3 cp s3://jobennin-emr-data/hp-mapr/commons-lang-2.6.jar .

	aom git libqalculate oniguruma six
	awscli glib libssh2 openexr snappy
	brotli glow libtiff openjdk snzip
	ca-certificates gmp libvmaf [email protected] sqlite
	cairo gnu-sed libx11 openssl@3 terraform
	colordiff gnuplot libxau pandoc tmux
	coreutils graphite2 libxcb pango tree
	cscope grep libxdmcp pcre utf8proc
	csvkit harfbuzz libxext pcre2 webp
	dateutils highway libxrender pixman xml2

	"""
	Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

	Permission is hereby granted, free of charge, to any person obtaining a copy of
	this software and associated documentation files (the "Software"), to deal in
	the Software without restriction, including without limitation the rights to
	use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
	the Software, and to permit persons to whom the Software is furnished to do so.

	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

	#!/bin/bash
	set -o nounset; set -o errexit; set -o pipefail

	err() {
	echo "[$(date +'%Y-%m-%dT%H:%M:%S')]: $*" >&2
	}

	VERSION=3.6.1
	INSTALL_LOCATION=/usr/local/bin/scalafmt-native
	curl https://raw.githubusercontent.com/scalameta/scalafmt/master/bin/install-scalafmt-native.sh \| \