- https://medium.datadriveninvestor.com/how-to-build-a-recommendation-system-for-purchase-data-step-by-step-d6d7a78800b6
- https://www.kaggle.com/c/santander-product-recommendation
- https://www.kaggle.com/retailrocket/ecommerce-dataset/home
- https://www.kaggle.com/dschettler8845/recsys-2020-ecommerce-dataset/tasks?taskId=3124
- https://www.kaggle.com/sohamohajeri/recommendation-system-for-electronic-dataset
- https://towardsdatascience.com/extreme-deep-factorization-machine-xdeepfm-1ba180a6de78
- https://medium.com/building-creative-market/word2vec-inspired-recommendations-in-production-f2c6a6b5b0bf
- https://medium.com/shoprunner/fetching-better-beer-recommendations-with-collie-part-1-18c73ab30fbd
from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
model_name = "tr3cks/3LabelsSentimentAnalysisSpanish" | |
tokenizer_sent_esp = AutoTokenizer.from_pretrained(model_name) | |
model_sent_esp = AutoModelForSequenceClassification.from_pretrained(model_name) | |
# The output is ['ja', '##ja', '##ja', 'que', 'risa', 'me', 'da'] | |
tokenizer_sent_esp.tokenize('jajaja que risa me da') |
The below table shows some examples of heuristic benchmarks to compare the performance of a machine learning model when no previous solution exists. The original version of the table can be found in the Machine Learning Design Patterns Book (pattern 28)
| Scenario | Heuristic benchmark | Example task | Implementation for example task
- install ProjectEnv plugin
- install direnv
- create a .env file and configure the env variables
- create a .envrc file ans write on it dotenv
- configure in pycharm > setting > build ... > ProjectEnv > add files (.env)
def unify_victoria_secret(df):
"""
We want that all brands that are related to Victoria's Secret
have `victoria's secret` as their brand instead of what they
currently have.
"""
df = df.copy()
new_string = "victoria's secret"
df.loc[df["brand_name"].isin(["Victorias-Secret", "Victoria's Secret", "Victoria's Secret Pink"]), "brand_name"] = new_string
El objetivo de este desafío es construir un pequeño pipeline de datos automatizado que extraiga información desde una fuente externa (la API del New York Times), la almacene en una base de datos analítica (BigQuery) y permita consultarla de manera eficiente.
Vas a desarrollar un script en Python que se conecte a la API de noticias del NYTimes y extraiga artículos recientes según ciertos parámetros. Esa información debe ser almacenada en una tabla en Google BigQuery para su posterior análisis.