Skip to content

Instantly share code, notes, and snippets.

View hakanilter's full-sized avatar

Hakan İlter hakanilter

View GitHub Profile
@hakanilter
hakanilter / athena.sql
Created November 8, 2018 13:04
Athena create select query with location
CREATE TABLE sampledb.test_empty_array_parquet
WITH (
format = 'PARQUET',
external_location = 's3://somewhere'
)
AS SELECT *
FROM sampledb.test_empty_array
@hakanilter
hakanilter / CreateSparkDataFrameFromAzureBlobStorage.scala
Last active October 14, 2018 23:15
Create Spark DataFrame from Azure Blob Storage
/*
Add following dependencies:
com.microsoft.azure:azure-storage:2.0.0
org.apache.hadoop:hadoop-azure:2.7.3
Exclude:
com.fasterxml.jackson.core:*:*
*/
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
@hakanilter
hakanilter / elasticsearch.sh
Last active September 26, 2018 23:00
ES6 Setup Scripts
#! /bin/sh
### BEGIN INIT INFO
# Provides: elasticsearch
# Required-Start: $all
# Required-Stop: $all
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Starts elasticsearch
# Description: Starts elasticsearch using start-stop-daemon
### END INIT INFO
@hakanilter
hakanilter / AvroJsonToDf.scala
Created September 25, 2018 20:45
Load Avro files and extract json string as dataframe
import org.apache.spark.sql.functions.udf
import spark.implicits._
// read avro
val input = "/Users/hakanilter/dev/workspace/mc/data/avroFiles/*"
val data = spark.read
.format("com.databricks.spark.avro")
.option("header","true")
.load(input)
@hakanilter
hakanilter / impala-partitions-report.sh
Last active July 20, 2018 10:32
Script for generating CSV partitions report for Impala
# Script for generating csv partitions report for Impala
IMPALA_DAEMON=localhost
databases=$(impala-shell --quiet -i $IMPALA_DAEMON -d default --delimited -q "SHOW DATABASES" | cut -f1 | grep -e dl -e ods_)
for database in $databases
do
echo $database
directory="partitions/$database"
mkdir -p $directory
@hakanilter
hakanilter / flume-kafka-source.properties
Last active May 17, 2018 16:17
Example Flume Configuration For Kafka Source
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
# sources
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.zookeeperConnect = localhost:2181
tier1.sources.source1.topic = network-data
tier1.sources.source1.groupId = flume-kafka-test
tier1.sources.source1.channels = channel1
@hakanilter
hakanilter / readme.md
Last active March 16, 2018 04:41
Apache Spark - Apache Cassandra Integration

Create a new instance, edit following file

sudo vim /etc/yum.repos.d/cassandra.repo

Add Cassandra repo

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
@hakanilter
hakanilter / setup.sh
Last active April 9, 2021 19:56
AWS EMR Examples Master Setup
#!/bin/bash
# install git
sudo yum install git
# maven
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
mvn --version
@hakanilter
hakanilter / kafka-elastic.js
Last active October 2, 2019 07:56
Indexing data from Kafka to ElasticSearch with Node.js in 30 lines :)
var elasticsearch = require('elasticsearch');
var elastic = new elasticsearch.Client({
host: 'localhost:9200',
log: 'info'
});
var kafka = require('kafka-node'),
HighLevelConsumer = kafka.HighLevelConsumer,
client = new kafka.Client(),
consumer = new HighLevelConsumer(
@hakanilter
hakanilter / JdbcDynamoDbExportJob.java
Created August 9, 2017 23:34
Export data from Jdbc datasource to DynamoDB with Spark
SparkConf sparkConf = new SparkConf()
.setAppName(JdbcDynamoDbExportJob.class.getSimpleName())
.setMaster(config.getProperty("spark.master"));
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(jsc);
// read from database
Properties properties = new Properties();
properties.setProperty("user", config.getProperty("jdbc.user"));
properties.setProperty("password", config.getProperty("jdbc.pass"));