DANGerous tommydangerous

🧙

Fire!

Making data more accessible.

tommydangerous / gist:faca580583db45c517e1f8a07437deab

Created January 17, 2018 04:11

Install RVM & Ruby on Ubuntu

	# https://www.digitalocean.com/community/tutorials/how-to-install-ruby-on-rails-with-rvm-on-ubuntu-16-04
	$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB
	$ cd /tmp
	$ curl -sSL https://get.rvm.io -o rvm.sh
	$ ./rvm.sh
	$ cat /tmp/rvm.sh \| bash -s stable --rails
	$ source /home/ubuntu/.rvm/scripts/rvm
	$ rvm list known
	$ rvm install 2.5

tommydangerous / gist:ceb38b66c1f8f6303c07d0d2730f0caf

Created January 17, 2018 05:29

Setup Anaconda & Jupyter on Ubuntu

	# https://www.anaconda.com/download/#linux
	$ ssh jpy
	$ mkdir tmp
	$ cd tmp
	$ wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
	$ bash Anaconda3-5.0.1-Linux-x86_64.sh
	$ vi ~/.bashrc

	# export PATH=~/anaconda3/bin:$PATH

tommydangerous / gist:0d25ccf37b67b5e98e1f655043f16d4d

Created January 23, 2018 07:36

Cannot start service app

	Cannot start service app: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused"

	$ sudo systemctl daemon-reload
	$ sudo systemctl restart docker

tommydangerous / pyspark_load_data_from_s3.py

Last active May 13, 2021 17:27

PySpark load data from S3

	from pyspark.sql import SparkSession


	def load_data(spark, s3_location):
	"""
	spark:
	Spark session
	s3_location:
	S3 bucket name and object prefix
	"""

tommydangerous / define_function.py

Created May 13, 2021 03:20

PySpark example part 1

	from pyspark.sql.functions import pandas_udf, PandasUDFType


	@pandas_udf(
	SCHEMA_COMING_SOON,
	PandasUDFType.GROUPED_MAP,
	)
	def custom_transformation_function(df):
	pass

tommydangerous / define_schema.py

Last active May 13, 2021 03:31

PySpark example 2

	from pyspark.sql.functions import pandas_udf, PandasUDFType
	from pyspark.sql.types import (
	IntegerType,
	StringType,
	StructField,
	StructType,
	)


	"""

tommydangerous / code_logic.py

Created May 13, 2021 03:28

PySpark example 3

	from pyspark.sql.functions import pandas_udf, PandasUDFType
	from pyspark.sql.types import (
	IntegerType,
	StringType,
	StructField,
	StructType,
	)


	"""

tommydangerous / all_together.py

Last active May 13, 2021 17:36

PySpark example all together

	from pyspark.sql import SparkSession
	from pyspark.sql.functions import pandas_udf, PandasUDFType
	from pyspark.sql.types import (
	IntegerType,
	StringType,
	StructField,
	StructType,
	)

tommydangerous / download_and_split_data.py

Created June 11, 2021 05:22

download_and_split_data

	from sklearn.model_selection import train_test_split
	import pandas as pd

	df = pd.read_csv('/content/titanic_survival.csv')
	label_feature_name = 'Survived'

	X = df.drop(columns=[label_feature_name])
	y = df[label_feature_name]

tommydangerous / split_data.py

Created June 11, 2021 05:24

split_data

	X_train_raw, X_test_raw, y_train, y_test = train_test_split(
	X,
	y,
	stratify=y,
	test_size=0.2,
	)