Skip to content

Instantly share code, notes, and snippets.

View belenaj's full-sized avatar

Jorge belenaj

View GitHub Profile
@belenaj
belenaj / add_partition.sql
Last active September 9, 2020 14:53
[Recreate partitions] #s3 #bash
ALTER TABLE schema_name.table_name
ADD IF NOT EXISTS PARTITION (YEAR=${hiveconf:year}, MONTH=${hiveconf:month}, DAY=${hiveconf:day})
LOCATION 's3://mybucket/some/prefix/${hiveconf:year}/${hiveconf:month}/${hiveconf:day}/';
@belenaj
belenaj / loop_create_partitions.sh
Created September 9, 2020 13:01
recreate all partitions in hive table
start='2020-04-01'
end='2020-09-09'
start=$(date -d $start +%Y%m%d)
end=$(date -d $end +%Y%m%d)
iter=$start
while [[ $iter -le $end ]]
do
@belenaj
belenaj / iterm-backup-history.sh
Last active September 9, 2020 09:07
[iterm-backup-history.sh] Utility to easily backup your iTerm command history #bash #iterm
#!/bin/bash
###################################################################
# Script Name : iterm-backup-history.sh
# Description : Utility to easily backup your iTerm command history
# before a meeting, for example
# Args : clean || restore
# Author : belenaj
###################################################################
ACTION=$1
@belenaj
belenaj / Dockerfile
Last active August 14, 2020 14:09
spark-app-docker-emr-6.0.0
# https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-docker.html
# https://aws.amazon.com/blogs/big-data/run-spark-applications-with-docker-using-amazon-emr-6-0-0-beta/
FROM amazoncorretto:8
RUN yum -y update
RUN yum -y install yum-utils
RUN yum -y groupinstall development
RUN yum list python3*
@belenaj
belenaj / sipsRaw2Jpeg.md
Last active May 18, 2024 16:04
Convert RAW photos to JPG in the Mac OS terminal

Convert RAW photos to JPG in the Mac OS terminal

Source: https://coderwall.com/p/nhp7mq/convert-raw-photos-to-jpg-in-the-mac-os-terminal

No need for slow and heavy Photoshop scripts for this one, you can do easily do this right from your terminal window.

This is possible using "sips", an image editing tool already available on Mac which allows you to do all sorts of image manipulation, including resizing and converting.

So we first grab all RAW files in a folder, We convert them to jpeg (or any other format),

Exercises

  1. Select all "Harry Potter" books
  2. Book with more pages
  3. Top 5 authors with more written books (assume author in first position in the array, "key" field) (assuming each row is a different book)
  4. Top 5 Genres with more books

  1. Avg. number of pages (needs cleaning)
  2. Per publish year, get the number of authors that published at least one book

filter_book.py

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
import sys

sc = SparkContext('local')
spark = SparkSession(sc)
@belenaj
belenaj / Dockerfile
Last active February 4, 2020 18:01
exasol-db Dockerfile with /exasol/cloud-storage-etl-udfs jar . https://github.com/exasol/cloud-storage-etl-udfs
FROM exasol/docker-db:latest
ENV EXA_BUCKET_PATH="/exa/data/bucketfs/bfsdefault/default"
ENV CLOUD_STORAGE_VERSION="0.6.0"
ENV JAR_FILENAME="cloud-storage-etl-udfs-$CLOUD_STORAGE_VERSION.jar"
ADD https://github.com/exasol/cloud-storage-etl-udfs/releases/download/v$CLOUD_STORAGE_VERSION/$JAR_FILENAME $EXA_BUCKET_PATH/$JAR_FILENAME
RUN chmod 775 $EXA_BUCKET_PATH/$JAR_FILENAME
#RUN chown exadefusr:exausers $EXA_BUCKET_PATH/$JAR_FILENAME
@belenaj
belenaj / Dockerfile
Last active September 9, 2020 08:59
[aws cli in Docker Alpine] #docker #awscli
FROM alpine:3.10.3
ENV AWSCLI_VERSION "1.14.10"
RUN apk add --no-cache \
openssh \
python \
py-pip
# installing aws cli
// ...
def DAYS_BACK = 30
def iterDate = new Date() - DAYS_BACK
def newDateParse
for (i=0; i <DAYS_BACK; i++) {
iterDate = iterDate + 1
newDateParse = iterDate.format("yyyy-MM-dd")
stage("newDateParsed ${newDateParse}") {