Skip to content

Instantly share code, notes, and snippets.

View cvargas-xbrein's full-sized avatar
🎯

Cristian Vargas cvargas-xbrein

🎯
View GitHub Profile
# shortform git commands
alias g='git'
# push changes to an empty git repository for the first time
git push --set-upstream origin master
# Remove + and - from start of diff lines
git diff --color | sed "s/^\([^-+ ]*\)[-+ ]/\\1/" | less -r
# clear out git hooks
import json
import boto3
region = 'us-east-2'
ec2 = boto3.client('ec2', region_name=region)
def lambda_handler(event, context):
instances = event["instances"].split(',')
action = event["action"]
#!/usr/bin/env python3
from __future__ import print_function
import os
import sys
from airflow import settings
from airflow.models import Connection
from sqlalchemy.orm import exc
@cvargas-xbrein
cvargas-xbrein / airflow-python3.sh
Created April 13, 2021 19:22 — forked from zacgx/airflow-python3.sh
Installing Airflow with CeleryExcuter, using PostgreSQL as metadata database and Redis for Celery message broker
# this script has been tested and worked in a freshly installed Ubuntu 16.04 and 16.10
# it assumes that you are running airflow in a private netowrk and no need to be worry about outside access
# if that's not the case, the lines for PostgreSQL and Redis in this script need to be updated accordingly
# run as root
sudo su
# initial system updates and installs
apt-get update && apt-get upgrade -y && apt-get autoremove && apt-get autoclean
apt-get -y install build-essential binutils gcc make git htop nethogs tmux
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import *
from awsglue.dynamicframe import DynamicFrame
@cvargas-xbrein
cvargas-xbrein / Install_FFmepg_OpenCV_EMR.md
Created September 15, 2021 16:09 — forked from phonchi/Install_FFmepg_OpenCV_EMR.md
Intall FFmepg and OpenCV on AWS EMR
sudo yum -y update
sudo yum -y groupinstall 'Development Tools'
sudo yum install -y cmake git pkgconfig
sudo yum install -y libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel libtiff-devel libwebp-devel
sudo yum install -y libdc1394-devel libv4l-devel gstreamer-plugins-base-devel
sudo yum install -y gtk2-devel
sudo yum install -y tbb-devel eigen3-devel
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
@cvargas-xbrein
cvargas-xbrein / install-units.md
Created December 17, 2021 12:45 — forked from slowkow/install-units.md
Install the 'units' R package on Partners

Summary

I had a difficult time installing the units R package on the Partners ERIS servers.

I hope this post helps you to figure out how to work around the errors.

Instructions

@cvargas-xbrein
cvargas-xbrein / pyspark_jdbc_df_count.md
Created May 10, 2022 13:34 — forked from tilakpatidar/pyspark_jdbc_df_count.md
Gist to perform count() on jdbc sources without re-reading the df

Postgres snippet

create database test_db;

create table t_random as select s, md5(random()::text) from generate_Series(1,5000) s;

Pyspark snippet

In [1]: df=spark.read.jdbc(url="jdbc:postgresql://localhost:5432/test_db", table="t_random", properties={"driver": "org.postgresql.Driver"}).repartition(10)
@cvargas-xbrein
cvargas-xbrein / amazon_athena_create_table.ddl
Created June 14, 2022 16:46 — forked from EngineerLabShimazu/amazon_athena_create_table.ddl
Create a table in Athena from a csv file with header stored in S3.
CREATE EXTERNAL TABLE IF NOT EXISTS default.table
(
`id` int,
`name` string,
`timestamp` string,
`is_debug` boolean
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'escapeChar'='\\',
import json
from pprint import pprint as pp
def jenks_matrices_init(data, n_classes):
#fill the matrices with data+1 arrays of n_classes 0s
lower_class_limits = []
variance_combinations = []
for i in xrange(0, len(data)+1):
temp1 = []
temp2 = []