This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Single column replace multiple characters | |
Function to remove multiple special characters | |
let | |
Source = (col_val as text) => | |
let | |
clean_text = if col_val = null then col_val else | |
List.Accumulate(Text.ToList("^{}®™") | |
,col_val, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
let | |
Source = Excel.Workbook(File.Contents("G:\My Drive\Eric\DynamicPivot-Two products Six Suppliers.xlsx"), null, true), | |
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data], | |
set_headers = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]), | |
set_types = Table.TransformColumnTypes(set_headers,{{"Product No", Int64.Type}, {"Supplier", type text}, {"Month", type text}, {"Amount", Int64.Type}}), | |
replace_null_amounts = Table.ReplaceValue(set_types,null,0,Replacer.ReplaceValue,{"Amount"}), | |
group_by_product_supplier_month = Table.Group(replace_null_amounts, {"Product No", "Supplier", "Month"}, {{"Total", each List.Sum([Amount]), type nullable number}}), | |
pivot_months = Table.Pivot(group_by_product_supplier_month, List.Distinct(group_by_product_supplier_month[Month]), "Month", "Total", List.Sum), | |
remove_space_from_column_name = Table.RenameColumns(pivot_months,{{"Product No", "Product_No"}}), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Environment creation for power bi ML | |
conda create -n py37 -c anaconda python=3.7 | |
conda activate py37_env | |
pip install matplotlib | |
pip install pandas | |
pip install sckit-learn | |
pip install xgboost | |
Set the environment | |
C:\Users\username\miniconda3\envs\py37_env |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Define a helper function to fix truncated zeros, with one parameter: the value to be processed | |
def fix_trunc_zeros(val): | |
# Use an if statement to check if there are four characters in the string representation of the value | |
if len(str(val)) == 4: | |
# If this is the case, return the value with an appended "0" in the front | |
return "0"+str(val) | |
# Otherwise... | |
else: | |
# Return the value itself | |
return str(val) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def biplot(score, coeff, maxdim, pcax, pcay, labels=None): | |
""" | |
This function uses | |
score - the transformed data returned by pca - data as expressed according to the new axis | |
coeff - the loadings from the pca_components_ | |
For the feaures we are interested in, it plots the correlation between the original features and the PCAs. | |
Use cosine similarity and angle measures between axes. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import altair as alt | |
#Simple Altait plot | |
import altair as alt | |
def plotGenericLineChart(x_values, x_type , y_values, y_type, chart_title, x_axis_title, y_axis_title): | |
""" | |
This function creates a line chart for two values X and Y passed in along with thier types and titles | |
""" | |
df = pd.DataFrame({"x_values" : x_values, 'y_values': y_values}) | |
chart = alt.Chart(df).mark_line( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import altair as alt | |
from vega_datasets import data | |
# uses intermediate json files to speed things up | |
alt.data_transformers.enable('json') | |
domain_stock_list = ["AAPL", "DIS", "FB", "MSFT", "NFLX", "TSLA"] | |
def getBaseChart(): | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from os import listdir | |
def getAllStockData(data): | |
""" | |
Retrieves data from files downloaded from Marketwatch and stored in a directory with path data | |
""" | |
stock_df = pd.DataFrame() | |
for filename in listdir("data"): | |
if filename.startswith("Dow"): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: | |
# dataset = pandas.DataFrame(Emoji, Use Percent, Use Number) | |
# dataset = dataset.drop_duplicates() | |
# Paste or type your script code here: | |
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def getAmenitiesTokens(cell_val): | |
return regexp_tokenize(cell_val, "([\w\s\d\']+), ") | |
#Clear out unicode and unnecessary square brackets | |
amenities_df['amenities'] = amenities_df['amenities'].replace({'\[': '', '\]': '', '"':'', r'\\u2019': r"'", r"\\u2013":"-", r"\\u00a0":""}, regex=True) | |
# Follow the patttern! Lower case everything to make items similar | |
amenities_df['amenities'] = amenities_df['amenities'].str.lower() | |
#Apply the tokenizer | |
amenities_df["amenities_tokens"] = amenities_df["amenities"].apply(getAmenitiesTokens ) |