Skip to content

Instantly share code, notes, and snippets.

View sjtalkar's full-sized avatar
💭
Immersion training in Python and Power BI for DataScience

Simi Talkar sjtalkar

💭
Immersion training in Python and Power BI for DataScience
  • Mondelez Intl.
  • Bellevue, WA
View GitHub Profile
@sjtalkar
sjtalkar / gist:a93a5db94bd8fec6b13e470f0f5fb461
Last active May 5, 2022 21:00
Replace or substitute multiple values in a column in M Query
// Single column replace multiple characters
Function to remove multiple special characters
let
Source = (col_val as text) =>
let
clean_text = if col_val = null then col_val else
List.Accumulate(Text.ToList("^{}®™")
,col_val,
@sjtalkar
sjtalkar / gist:fe84dc1c5e64a62fe5ba55afd8510cbf
Created April 29, 2022 17:56
Dynamically grouping for user input and Excel Formula
let
Source = Excel.Workbook(File.Contents("G:\My Drive\Eric\DynamicPivot-Two products Six Suppliers.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
set_headers = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
set_types = Table.TransformColumnTypes(set_headers,{{"Product No", Int64.Type}, {"Supplier", type text}, {"Month", type text}, {"Amount", Int64.Type}}),
replace_null_amounts = Table.ReplaceValue(set_types,null,0,Replacer.ReplaceValue,{"Amount"}),
group_by_product_supplier_month = Table.Group(replace_null_amounts, {"Product No", "Supplier", "Month"}, {{"Total", each List.Sum([Amount]), type nullable number}}),
pivot_months = Table.Pivot(group_by_product_supplier_month, List.Distinct(group_by_product_supplier_month[Month]), "Month", "Total", List.Sum),
remove_space_from_column_name = Table.RenameColumns(pivot_months,{{"Product No", "Product_No"}}),
@sjtalkar
sjtalkar / gist:3dc7e166db44a811faac38f2666e95ac
Last active May 26, 2023 12:52
Creating Geopandas environment
## Environment creation for power bi ML
conda create -n py37 -c anaconda python=3.7
conda activate py37_env
pip install matplotlib
pip install pandas
pip install sckit-learn
pip install xgboost
Set the environment
C:\Users\username\miniconda3\envs\py37_env
# Define a helper function to fix truncated zeros, with one parameter: the value to be processed
def fix_trunc_zeros(val):
# Use an if statement to check if there are four characters in the string representation of the value
if len(str(val)) == 4:
# If this is the case, return the value with an appended "0" in the front
return "0"+str(val)
# Otherwise...
else:
# Return the value itself
return str(val)
@sjtalkar
sjtalkar / gist:e4f5448f9593e83a458f5d2164985073
Created June 21, 2021 13:56
Creating a Biplot (University of Michigan Masters Unsupervised Learning)
def biplot(score, coeff, maxdim, pcax, pcay, labels=None):
"""
This function uses
score - the transformed data returned by pca - data as expressed according to the new axis
coeff - the loadings from the pca_components_
For the feaures we are interested in, it plots the correlation between the original features and the PCAs.
Use cosine similarity and angle measures between axes.
@sjtalkar
sjtalkar / gist:6857c1efa51217a08bbb8d1808ebed30
Last active November 16, 2022 15:09
Altair functions for EDA
import altair as alt
#Simple Altait plot
import altair as alt
def plotGenericLineChart(x_values, x_type , y_values, y_type, chart_title, x_axis_title, y_axis_title):
"""
This function creates a line chart for two values X and Y passed in along with thier types and titles
"""
df = pd.DataFrame({"x_values" : x_values, 'y_values': y_values})
chart = alt.Chart(df).mark_line(
import altair as alt
from vega_datasets import data
# uses intermediate json files to speed things up
alt.data_transformers.enable('json')
domain_stock_list = ["AAPL", "DIS", "FB", "MSFT", "NFLX", "TSLA"]
def getBaseChart():
"""
@sjtalkar
sjtalkar / gist:61e22d7d971c029c9edc4c33160b739a
Last active March 4, 2021 14:39
Using Regexp to create column from filename and rolling mean
from os import listdir
def getAllStockData(data):
"""
Retrieves data from files downloaded from Marketwatch and stored in a directory with path data
"""
stock_df = pd.DataFrame()
for filename in listdir("data"):
if filename.startswith("Dow"):
@sjtalkar
sjtalkar / gist:2f01a542f5b0f96082875ddc0f517a30
Last active February 13, 2021 16:05
Altair Emoji category chart within PowerBI
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(Emoji, Use Percent, Use Number)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
def getAmenitiesTokens(cell_val):
return regexp_tokenize(cell_val, "([\w\s\d\']+), ")
#Clear out unicode and unnecessary square brackets
amenities_df['amenities'] = amenities_df['amenities'].replace({'\[': '', '\]': '', '"':'', r'\\u2019': r"'", r"\\u2013":"-", r"\\u00a0":""}, regex=True)
# Follow the patttern! Lower case everything to make items similar
amenities_df['amenities'] = amenities_df['amenities'].str.lower()
#Apply the tokenizer
amenities_df["amenities_tokens"] = amenities_df["amenities"].apply(getAmenitiesTokens )