Skip to content

Instantly share code, notes, and snippets.

View hakanilter's full-sized avatar

Hakan İlter hakanilter

View GitHub Profile
@hakanilter
hakanilter / flume-kafka-source.properties
Last active May 17, 2018 16:17
Example Flume Configuration For Kafka Source
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
# sources
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.zookeeperConnect = localhost:2181
tier1.sources.source1.topic = network-data
tier1.sources.source1.groupId = flume-kafka-test
tier1.sources.source1.channels = channel1
@hakanilter
hakanilter / impala-partitions-report.sh
Last active July 20, 2018 10:32
Script for generating CSV partitions report for Impala
# Script for generating csv partitions report for Impala
IMPALA_DAEMON=localhost
databases=$(impala-shell --quiet -i $IMPALA_DAEMON -d default --delimited -q "SHOW DATABASES" | cut -f1 | grep -e dl -e ods_)
for database in $databases
do
echo $database
directory="partitions/$database"
mkdir -p $directory
@hakanilter
hakanilter / AvroJsonToDf.scala
Created September 25, 2018 20:45
Load Avro files and extract json string as dataframe
import org.apache.spark.sql.functions.udf
import spark.implicits._
// read avro
val input = "/Users/hakanilter/dev/workspace/mc/data/avroFiles/*"
val data = spark.read
.format("com.databricks.spark.avro")
.option("header","true")
.load(input)
@hakanilter
hakanilter / elasticsearch.sh
Last active September 26, 2018 23:00
ES6 Setup Scripts
#! /bin/sh
### BEGIN INIT INFO
# Provides: elasticsearch
# Required-Start: $all
# Required-Stop: $all
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Starts elasticsearch
# Description: Starts elasticsearch using start-stop-daemon
### END INIT INFO
@hakanilter
hakanilter / CreateSparkDataFrameFromAzureBlobStorage.scala
Last active October 14, 2018 23:15
Create Spark DataFrame from Azure Blob Storage
/*
Add following dependencies:
com.microsoft.azure:azure-storage:2.0.0
org.apache.hadoop:hadoop-azure:2.7.3
Exclude:
com.fasterxml.jackson.core:*:*
*/
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
@hakanilter
hakanilter / athena.sql
Created November 8, 2018 13:04
Athena create select query with location
CREATE TABLE sampledb.test_empty_array_parquet
WITH (
format = 'PARQUET',
external_location = 's3://somewhere'
)
AS SELECT *
FROM sampledb.test_empty_array
@hakanilter
hakanilter / awslogs-setup.sh
Last active November 22, 2018 10:44
Installing AWS Cloudwatch Agent in Debian
sudo su
apt-get install -y libyaml-dev python-dev python3-dev python3-pip
pip3 install awscli-cwlogs
if [ ! -d /var/awslogs/bin ] ; then
mkdir -p /var/awslogs/bin
ln -s /usr/local/bin/aws /var/awslogs/bin/aws
fi
mkdir /opt/awslogs
cd /opt/awslogs
curl https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py -O
@hakanilter
hakanilter / json_hive_definition.py
Last active November 28, 2018 16:41
Fastest way to get Hive definition for a given Json file
def json_hive_def(path):
spark.read.json(path).createOrReplaceTempView("temp_view")
spark.sql("CREATE TABLE temp_table AS SELECT * FROM temp_view LIMIT 0")
script = spark.sql("SHOW CREATE TABLE temp_table").take(1)[0].createtab_stmt.replace('\n', '')
spark.sql("DROP TABLE temp_table")
return script
@hakanilter
hakanilter / ecs-run-and-wait.sh
Last active December 7, 2023 16:50
AWS ECS run task and wait for the result
# Requies JSON as the output format and "jq" commandline tool
# If task runs successfuly, exits 0
run_result=$(aws ecs run-task \
--cluster ${CLUSTER} \
--task-definition ${TASK_DEFINITION} \
--launch-type EC2 \
--overrides "${OVERRIDES}")
echo ${run_result}
container_arn=$(echo $run_result | jq -r '.tasks[0].taskArn')
aws ecs wait tasks-stopped \
@hakanilter
hakanilter / mongodb-setup.sh
Created February 27, 2019 10:22
Amazon Linux Single Node Simple MongoDB Setup
# Update packages
sudo yum update -y
# Mount EBS volume
sudo mkfs -t xfs /dev/xvdb
sudo mkdir /data
sudo mount /dev/xvdb /data
# Install MongoDB
echo '