def unify_victoria_secret(df):
"""
We want that all brands that are related to Victoria's Secret
have `victoria's secret` as their brand instead of what they
currently have.
"""
df = df.copy()
new_string = "victoria's secret"
df.loc[df["brand_name"].isin(["Victorias-Secret", "Victoria's Secret", "Victoria's Secret Pink"]), "brand_name"] = new_string
- install ProjectEnv plugin
- install direnv
- create a .env file and configure the env variables
- create a .envrc file ans write on it dotenv
- configure in pycharm > setting > build ... > ProjectEnv > add files (.env)
The below table shows some examples of heuristic benchmarks to compare the performance of a machine learning model when no previous solution exists. The original version of the table can be found in the Machine Learning Design Patterns Book (pattern 28)
| Scenario | Heuristic benchmark | Example task | Implementation for example task
- https://medium.datadriveninvestor.com/how-to-build-a-recommendation-system-for-purchase-data-step-by-step-d6d7a78800b6
- https://www.kaggle.com/c/santander-product-recommendation
- https://www.kaggle.com/retailrocket/ecommerce-dataset/home
- https://www.kaggle.com/dschettler8845/recsys-2020-ecommerce-dataset/tasks?taskId=3124
- https://www.kaggle.com/sohamohajeri/recommendation-system-for-electronic-dataset
- https://towardsdatascience.com/extreme-deep-factorization-machine-xdeepfm-1ba180a6de78
- https://medium.com/building-creative-market/word2vec-inspired-recommendations-in-production-f2c6a6b5b0bf
- https://medium.com/shoprunner/fetching-better-beer-recommendations-with-collie-part-1-18c73ab30fbd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
model_name = "tr3cks/3LabelsSentimentAnalysisSpanish" | |
tokenizer_sent_esp = AutoTokenizer.from_pretrained(model_name) | |
model_sent_esp = AutoModelForSequenceClassification.from_pretrained(model_name) | |
# The output is ['ja', '##ja', '##ja', 'que', 'risa', 'me', 'da'] | |
tokenizer_sent_esp.tokenize('jajaja que risa me da') |