Simi Talkar sjtalkar

💭

Immersion training in Python and Power BI for DataScience

Data all the way.. Certified Power BI (DP-100) Azure Data Science Associate Certified Power BI (DA-100) Data Science Student at University of Michigan

5 followers · 1 following

Mondelez Intl.
Bellevue, WA

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

sjtalkar / gist:a93a5db94bd8fec6b13e470f0f5fb461

Last active May 5, 2022 21:00

Replace or substitute multiple values in a column in M Query

	// Single column replace multiple characters

	Function to remove multiple special characters
	let
	Source = (col_val as text) =>
	let

	clean_text = if col_val = null then col_val else
	List.Accumulate(Text.ToList("^{}®™")
	,col_val,

sjtalkar / gist:fe84dc1c5e64a62fe5ba55afd8510cbf

Created April 29, 2022 17:56

Dynamically grouping for user input and Excel Formula

	let
	Source = Excel.Workbook(File.Contents("G:\My Drive\Eric\DynamicPivot-Two products Six Suppliers.xlsx"), null, true),
	Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
	set_headers = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
	set_types = Table.TransformColumnTypes(set_headers,{{"Product No", Int64.Type}, {"Supplier", type text}, {"Month", type text}, {"Amount", Int64.Type}}),
	replace_null_amounts = Table.ReplaceValue(set_types,null,0,Replacer.ReplaceValue,{"Amount"}),
	group_by_product_supplier_month = Table.Group(replace_null_amounts, {"Product No", "Supplier", "Month"}, {{"Total", each List.Sum([Amount]), type nullable number}}),
	pivot_months = Table.Pivot(group_by_product_supplier_month, List.Distinct(group_by_product_supplier_month[Month]), "Month", "Total", List.Sum),
	remove_space_from_column_name = Table.RenameColumns(pivot_months,{{"Product No", "Product_No"}}),

sjtalkar / gist:3dc7e166db44a811faac38f2666e95ac

Last active May 26, 2023 12:52

Creating Geopandas environment

	## Environment creation for power bi ML
	conda create -n py37 -c anaconda python=3.7
	conda activate py37_env
	pip install matplotlib
	pip install pandas
	pip install sckit-learn
	pip install xgboost

	Set the environment
	C:\Users\username\miniconda3\envs\py37_env

sjtalkar / gist:22ac2d07b9affa26af6eb41dc76d97c7

Last active October 29, 2021 17:41

ArcGIS Functions

	# Define a helper function to fix truncated zeros, with one parameter: the value to be processed
	def fix_trunc_zeros(val):
	# Use an if statement to check if there are four characters in the string representation of the value
	if len(str(val)) == 4:
	# If this is the case, return the value with an appended "0" in the front
	return "0"+str(val)
	# Otherwise...
	else:
	# Return the value itself
	return str(val)

sjtalkar / gist:e4f5448f9593e83a458f5d2164985073

Created June 21, 2021 13:56

Creating a Biplot (University of Michigan Masters Unsupervised Learning)


	def biplot(score, coeff, maxdim, pcax, pcay, labels=None):
	"""
	This function uses
	score - the transformed data returned by pca - data as expressed according to the new axis
	coeff - the loadings from the pca_components_

	For the feaures we are interested in, it plots the correlation between the original features and the PCAs.
	Use cosine similarity and angle measures between axes.

sjtalkar / gist:6857c1efa51217a08bbb8d1808ebed30

Last active November 16, 2022 15:09

Altair functions for EDA

	import altair as alt

	#Simple Altait plot
	import altair as alt
	def plotGenericLineChart(x_values, x_type , y_values, y_type, chart_title, x_axis_title, y_axis_title):
	"""
	This function creates a line chart for two values X and Y passed in along with thier types and titles
	"""
	df = pd.DataFrame({"x_values" : x_values, 'y_values': y_values})
	chart = alt.Chart(df).mark_line(

sjtalkar / Creating multiline toolip chart in Altair with legend selection

Last active February 26, 2021 15:57

	import altair as alt
	from vega_datasets import data

	# uses intermediate json files to speed things up
	alt.data_transformers.enable('json')

	domain_stock_list = ["AAPL", "DIS", "FB", "MSFT", "NFLX", "TSLA"]

	def getBaseChart():
	"""

sjtalkar / gist:61e22d7d971c029c9edc4c33160b739a

Last active March 4, 2021 14:39

Using Regexp to create column from filename and rolling mean

	from os import listdir

	def getAllStockData(data):
	"""
	Retrieves data from files downloaded from Marketwatch and stored in a directory with path data

	"""
	stock_df = pd.DataFrame()
	for filename in listdir("data"):
	if filename.startswith("Dow"):

sjtalkar / gist:2f01a542f5b0f96082875ddc0f517a30

Last active February 13, 2021 16:05

Altair Emoji category chart within PowerBI

	# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

	# dataset = pandas.DataFrame(Emoji, Use Percent, Use Number)
	# dataset = dataset.drop_duplicates()

	# Paste or type your script code here:

	# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

sjtalkar / gist:8183d0bae645d8dbff344e43303671a3

Last active February 2, 2021 21:22

Amenities analysis

	def getAmenitiesTokens(cell_val):
	return regexp_tokenize(cell_val, "([\w\s\d\']+), ")

	#Clear out unicode and unnecessary square brackets
	amenities_df['amenities'] = amenities_df['amenities'].replace({'\[': '', '\]': '', '"':'', r'\\u2019': r"'", r"\\u2013":"-", r"\\u00a0":""}, regex=True)
	# Follow the patttern! Lower case everything to make items similar
	amenities_df['amenities'] = amenities_df['amenities'].str.lower()

	#Apply the tokenizer
	amenities_df["amenities_tokens"] = amenities_df["amenities"].apply(getAmenitiesTokens )

Newer Older