jiahao87’s gists

jiahao87 / forecast_reconciliation.py

Last active March 13, 2022 08:14

Hierarchical Forecasting Reconciliation using OLS Method

	import numpy as np
	import pandas as pd
	import hts # To install: pip install scikit-hts
	import collections
	from scipy.optimize import lsq_linear


	hts_df = pd.DataFrame([{'total': 14,
	'CA': 5.4, 'TX': 1.8, 'WI': 5.9,
	'CA_1': 0.8, 'CA_2': 0.6, 'CA_3': 0.9, 'CA_4': 0.3,

jiahao87 / pegasus_fine_tune.py

Last active May 29, 2024 18:00

Pytorch script for fine-tuning Pegasus Large model

	"""Script for fine-tuning Pegasus
	Example usage:
	# use XSum dataset as example, with first 1000 docs as training data
	from datasets import load_dataset
	dataset = load_dataset("xsum")
	train_texts, train_labels = dataset['train']['document'][:1000], dataset['train']['summary'][:1000]

	# use Pegasus Large model as base for fine-tuning
	model_name = 'google/pegasus-large'
	train_dataset, _, _, tokenizer = prepare_data(model_name, train_texts, train_labels)

jiahao87 / sample_reviews.txt

Last active December 20, 2020 07:09

Sample reviews of top topics

	######################
	### Sample Reviews ###
	######################

	###### Topic 1 ######
	"From the start our experience was bad There was only one person on check in so we had to queue Having been allcated our rooms we had to change them as we had specified adjacent or interconnecting rooms which they failed to do We then had to queue up again for the one person still on reception and 45 minutes later were allocated 2 adjacent rooms But one of the rooms had a smell of drains which I reported and which the very discourteous duty manager Thalia refused to deal with In fact she told me several times that I was wrong The rooms were small the beds very soft and the shower and toilet were part of the bedroom The smell of drains was coming from the shower For such an expensive hotel this was unacceptable especially the way the duty manager treated her customers I don t think I have ever encountered a more unpleasant manner in my many years of travelling"

	"On arrival we only had 30 minutes to get ready We were to

jiahao87 / vaex_iris_sample.py

Created September 6, 2020 11:06

Vaex sample code for Iris data

	import vaex
	import vaex.ml

	# load iris data
	df = vaex.ml.datasets.load_iris()

	# perform train test split
	df_train, df_test = df.ml.train_test_split(test_size=0.2)

	# apply standardization transformation

jiahao87 / vaex_titanic_sample.py

Created September 6, 2020 09:30

Vaex sample code to titanic dataset

	import vaex
	import vaex.ml

	# load titanic data
	df_vaex = vaex.ml.datasets.load_titanic()

	# perform train test split
	df_train, df_test = df_vaex.ml.train_test_split(test_size=0.2)

	# One-hot encode some features

jiahao87 / mlflow_full_sample.py

Last active July 25, 2022 20:51

Full sample code for MLflow example

	import os
	import numpy as np
	from scipy.stats import uniform
	from sklearn.datasets import load_iris
	from sklearn.model_selection import train_test_split
	from sklearn.model_selection import cross_validate
	from sklearn import metrics
	from sklearn.model_selection import ParameterSampler
	from sklearn.ensemble import RandomForestClassifier

jiahao87 / mlflow_sample.py

Last active August 26, 2020 11:49

Sample code for MLflow

	X_train, X_test, y_train, y_test = data_processing()

	#################### 1. Setup Experiment ###########################
	# set experiment name to organize runs
	mlflow.set_experiment('New Experiment Name')
	experiment = mlflow.get_experiment_by_name('New Experiment Name')

	# set path to log data, e.g., mlruns local folder
	mlflow.set_tracking_uri('./mlruns')

jiahao87 / values.yaml

Created August 12, 2020 09:45

Configuration file template to update Dask Helm deployment

	# values.yaml to overwrite default values

	scheduler:
	image:
	tag: 2.21.0 # Container image tag
	serviceType: "LoadBalancer"
	resources:
	limits:
	cpu: 1
	memory: 6G

jiahao87 / exploratory_data_analysis.py

Last active May 26, 2023 08:31

	import pandas as pd
	import numpy as np
	import matplotlib
	import matplotlib.pyplot as plt
	import seaborn as sns
	import missingno
	import warnings
	warnings.filterwarnings("ignore")
	%matplotlib inline

jiahao87 / text_preprocessing.py

Last active July 27, 2024 14:47

Full code for preprocessing text

	from bs4 import BeautifulSoup
	import spacy
	import unidecode
	from word2number import w2n
	import contractions

	nlp = spacy.load('en_core_web_md')

	# exclude words from spacy stopwords list
	deselect_stop_words = ['no', 'not']