Skip to content

Instantly share code, notes, and snippets.

View aasthavar's full-sized avatar
🎯
Focusing

Aastha Varma aasthavar

🎯
Focusing
View GitHub Profile
@aasthavar
aasthavar / Python Books.md
Created June 25, 2021 04:12
Python Books

This is a collection of books that I've researched, scanned the TOCs of, and am currently working through.  The books are selected based on quality of content, reviews, and reccommendations of various 'best of' lists.

The goal of this collection is to promote mastery of generally applicable programming concepts.

Most topics are covered with Python as the primary language due to its conciseness, which is ideal for learning & practicing new concepts with minimal syntactic boilerplate.

JavaScript & Kotlin are listed in the Tooling section; as they allow extension of VS Code and the IntelliJ suite of IDEs, which cover most development needs.

 

@aasthavar
aasthavar / list.txt
Created November 24, 2021 10:10 — forked from shortjared/list.txt
List of AWS Service Principals
a4b.amazonaws.com
acm-pca.amazonaws.com
acm.amazonaws.com
alexa-appkit.amazon.com
alexa-connectedhome.amazon.com
amazonmq.amazonaws.com
apigateway.amazonaws.com
appflow.amazonaws.com
application-autoscaling.amazonaws.com
appstream.application-autoscaling.amazonaws.com

Semantic Commit Messages

See how a minor change to your commit message style can make you a better programmer.

Format: <type>(<scope>): <subject>

<scope> is optional

Example

@aasthavar
aasthavar / git-commands.txt
Last active January 24, 2023 12:21
frequently used git commands
git push origin main
git remote add origin https://github.com/zihangdai/xlnet.git
git archive --format zip --output prototype-code.zip main
@aasthavar
aasthavar / .gitignore
Created January 25, 2023 06:55
.gitignore for Jupyter notebooks
### JupyterNotebooks ###
# gitignore template for Jupyter Notebooks
# website: http://jupyter.org/
.ipynb_checkpoints
*/.ipynb_checkpoints/*
# IPython
profile_default/
ipython_config.py
@aasthavar
aasthavar / missing-data.py
Created February 23, 2023 07:03
Data processing - Missing data
def missing_data(data):
total = data.isnull().sum()
percent = (data.isnull().sum()/data.isnull().count()*100)
tt = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
types = []
for col in data.columns:
dtype = str(data[col].dtype)
types.append(dtype)
tt['Types'] = types
return(np.transpose(tt))
@aasthavar
aasthavar / unique-values.py
Created February 23, 2023 07:05
Data Processing - Unique Values
def unique_values(data):
total = data.count()
tt = pd.DataFrame(total)
tt.columns = ['Total']
uniques = []
for col in data.columns:
unique = data[col].nunique()
uniques.append(unique)
tt['Uniques'] = uniques
return(np.transpose(tt))
@aasthavar
aasthavar / data-cleaning-steps.txt
Created February 23, 2023 08:31
Helpful and Concise Data Cleaning Steps
# Source: https://www.kaggle.com/getting-started/250322
1. Identify the problematic data
2. Clean the data
3. Remove, encode, fill in any missing data
4. Remove outliers or analyze them separately
5. Purge contaminated data and correct leaking pipelines
6. Standardize inconsistent data
7. Check if your data makes sense (is valid)
8. Deduplicate multiple records of the same dataForesee and prevent type issues (string issues, DateTime issues)
@aasthavar
aasthavar / bagging-imputation.py
Created February 28, 2023 16:56
Imputation: Tree Bagging method
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
def tree_imputation(df):
missing_cols = [col for col in df.columns if df[col].isnull().sum() > 0]
non_missing_cols = [col for col in df.columns if df[col].isnull().sum() == 0]
# num_cols = [col for col in missing_cols if df[col].dtype != 'object']
# df = df[num_cols]
@aasthavar
aasthavar / likelihood_encoding.py
Created February 28, 2023 17:04
Encoding categorical variables using Likelihood Encoding
def likelihood_encoding(df, cat_cols, target_variable = "Status"):
# cat_cols.remove(target_variable)
df_temp = df.copy()
for col in cat_cols:
effect = {}
print(col)
for category in df[col].unique():
print(category)
try: