Skip to content

Instantly share code, notes, and snippets.

View tommydangerous's full-sized avatar
🧙
Fire!

DANGerous tommydangerous

🧙
Fire!
View GitHub Profile
@tommydangerous
tommydangerous / gist:faca580583db45c517e1f8a07437deab
Created January 17, 2018 04:11
Install RVM & Ruby on Ubuntu
# https://www.digitalocean.com/community/tutorials/how-to-install-ruby-on-rails-with-rvm-on-ubuntu-16-04
$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB
$ cd /tmp
$ curl -sSL https://get.rvm.io -o rvm.sh
$ ./rvm.sh
$ cat /tmp/rvm.sh | bash -s stable --rails
$ source /home/ubuntu/.rvm/scripts/rvm
$ rvm list known
$ rvm install 2.5
@tommydangerous
tommydangerous / gist:ceb38b66c1f8f6303c07d0d2730f0caf
Created January 17, 2018 05:29
Setup Anaconda & Jupyter on Ubuntu
# https://www.anaconda.com/download/#linux
$ ssh jpy
$ mkdir tmp
$ cd tmp
$ wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
$ bash Anaconda3-5.0.1-Linux-x86_64.sh
$ vi ~/.bashrc
# export PATH=~/anaconda3/bin:$PATH
Cannot start service app: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused"
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
@tommydangerous
tommydangerous / pyspark_load_data_from_s3.py
Last active May 13, 2021 17:27
PySpark load data from S3
from pyspark.sql import SparkSession
def load_data(spark, s3_location):
"""
spark:
Spark session
s3_location:
S3 bucket name and object prefix
"""
@tommydangerous
tommydangerous / define_function.py
Created May 13, 2021 03:20
PySpark example part 1
from pyspark.sql.functions import pandas_udf, PandasUDFType
@pandas_udf(
SCHEMA_COMING_SOON,
PandasUDFType.GROUPED_MAP,
)
def custom_transformation_function(df):
pass
@tommydangerous
tommydangerous / define_schema.py
Last active May 13, 2021 03:31
PySpark example 2
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql.types import (
IntegerType,
StringType,
StructField,
StructType,
)
"""
@tommydangerous
tommydangerous / code_logic.py
Created May 13, 2021 03:28
PySpark example 3
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql.types import (
IntegerType,
StringType,
StructField,
StructType,
)
"""
@tommydangerous
tommydangerous / all_together.py
Last active May 13, 2021 17:36
PySpark example all together
from pyspark.sql import SparkSession
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql.types import (
IntegerType,
StringType,
StructField,
StructType,
)
@tommydangerous
tommydangerous / download_and_split_data.py
Created June 11, 2021 05:22
download_and_split_data
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv('/content/titanic_survival.csv')
label_feature_name = 'Survived'
X = df.drop(columns=[label_feature_name])
y = df[label_feature_name]
X_train_raw, X_test_raw, y_train, y_test = train_test_split(
X,
y,
stratify=y,
test_size=0.2,
)