This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tier1.sources = source1 | |
tier1.channels = channel1 | |
tier1.sinks = sink1 | |
# sources | |
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource | |
tier1.sources.source1.zookeeperConnect = localhost:2181 | |
tier1.sources.source1.topic = network-data | |
tier1.sources.source1.groupId = flume-kafka-test | |
tier1.sources.source1.channels = channel1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Script for generating csv partitions report for Impala | |
IMPALA_DAEMON=localhost | |
databases=$(impala-shell --quiet -i $IMPALA_DAEMON -d default --delimited -q "SHOW DATABASES" | cut -f1 | grep -e dl -e ods_) | |
for database in $databases | |
do | |
echo $database | |
directory="partitions/$database" | |
mkdir -p $directory |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.functions.udf | |
import spark.implicits._ | |
// read avro | |
val input = "/Users/hakanilter/dev/workspace/mc/data/avroFiles/*" | |
val data = spark.read | |
.format("com.databricks.spark.avro") | |
.option("header","true") | |
.load(input) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/sh | |
### BEGIN INIT INFO | |
# Provides: elasticsearch | |
# Required-Start: $all | |
# Required-Stop: $all | |
# Default-Start: 2 3 4 5 | |
# Default-Stop: 0 1 6 | |
# Short-Description: Starts elasticsearch | |
# Description: Starts elasticsearch using start-stop-daemon | |
### END INIT INFO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Add following dependencies: | |
com.microsoft.azure:azure-storage:2.0.0 | |
org.apache.hadoop:hadoop-azure:2.7.3 | |
Exclude: | |
com.fasterxml.jackson.core:*:* | |
*/ | |
spark.conf.set( | |
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net", | |
"<your-storage-account-access-key>") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE sampledb.test_empty_array_parquet | |
WITH ( | |
format = 'PARQUET', | |
external_location = 's3://somewhere' | |
) | |
AS SELECT * | |
FROM sampledb.test_empty_array |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo su | |
apt-get install -y libyaml-dev python-dev python3-dev python3-pip | |
pip3 install awscli-cwlogs | |
if [ ! -d /var/awslogs/bin ] ; then | |
mkdir -p /var/awslogs/bin | |
ln -s /usr/local/bin/aws /var/awslogs/bin/aws | |
fi | |
mkdir /opt/awslogs | |
cd /opt/awslogs | |
curl https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py -O |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def json_hive_def(path): | |
spark.read.json(path).createOrReplaceTempView("temp_view") | |
spark.sql("CREATE TABLE temp_table AS SELECT * FROM temp_view LIMIT 0") | |
script = spark.sql("SHOW CREATE TABLE temp_table").take(1)[0].createtab_stmt.replace('\n', '') | |
spark.sql("DROP TABLE temp_table") | |
return script |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Requies JSON as the output format and "jq" commandline tool | |
# If task runs successfuly, exits 0 | |
run_result=$(aws ecs run-task \ | |
--cluster ${CLUSTER} \ | |
--task-definition ${TASK_DEFINITION} \ | |
--launch-type EC2 \ | |
--overrides "${OVERRIDES}") | |
echo ${run_result} | |
container_arn=$(echo $run_result | jq -r '.tasks[0].taskArn') | |
aws ecs wait tasks-stopped \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Update packages | |
sudo yum update -y | |
# Mount EBS volume | |
sudo mkfs -t xfs /dev/xvdb | |
sudo mkdir /data | |
sudo mount /dev/xvdb /data | |
# Install MongoDB | |
echo ' |