Skip to content

Instantly share code, notes, and snippets.

View 64lines's full-sized avatar

Julian Alexander Murillo 64lines

  • Huge Inc.
  • Medellin - Colombia
View GitHub Profile
@64lines
64lines / copy_command.sql
Last active August 18, 2019 19:50
Copy Command
copy schema.table
(field1, field2, field3, field4, field5)
from 's3://path/to/s3/folder/'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
format as csv;
@64lines
64lines / create_statement.sql
Created August 18, 2019 19:41
Create Statement
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
@64lines
64lines / unload_command.sql
Last active December 11, 2020 19:43
Unload Command
unload ('select * from schema.table')
to 's3://path/to/s3/folder/'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
allowoverwrite
format as csv;
@64lines
64lines / mac.md
Created March 29, 2019 01:05
Connect to a WIFI Mac OSX

1. Turn off wifi on your macbook from the Mac OSX terminal command line:

networksetup -setairportpower en0 off

2. Turn on wifi on your macbook from the Mac OSX terminal command line:

networksetup -setairportpower en0 on

3. List available wifi networks from the Mac OSX terminal command line:

# Job to load data from platform events db to parquet
# Based on Ky's script
#
# Parameters:
# --MONTHS: amount of months to overwrite the data. If value is "ALL" load all data
import os
import sys
import math
from datetime import datetime
fact_df.createOrReplaceTempView('fact_df')
fact_df = spark.sql('select * from fact_df where id > 42')
fact_df = fact_df.alias('fact_df')
from pyspark.sql.functions import *
# Generating continuous ids on random rows
fact_df = fact_df.withColumn('id', row_number().over(Window.orderBy(rand()))).alias('fact_df')
from pyspark.sql.functions import *
fact_df = fact_df.withColumn('columnname', coalesce(col('columnname'), col('columnname_if_null'), col('columnname_if_null'), col('columnname_if_null'))).alias('fact_df')
from pyspark.sql.functions import *
fact_df = fact_df.withColumn('column_name', regexp_replace(col('column_name', ',', ''))
from pyspark.sql.functions import *
# Example 1 (recommended)
fact_df = fact_df.withColumn('datecolumn', from_utc_timestamp(col('datecolumn'), "America/Los_Angeles"))
# Example 2 (not recommended)
fact_df = fact_df.withColumn('datecolumn', from_utc_timestamp(col('datecolumn'), "CST"))