- ID: Unique Identifier for a row
- Store_id: Unique id for each Store
- Store_Type: Type of the Store
- Location_Type: Type of the location where Store is located
- Region_Code: Code of the Region where Store is located
- Date: Information about the Date
- Holiday: If there is holiday on the given Date, 1 : Yes, 0 : No
- Discount: If discount is offered by store on the given Date, Yes/ No
- Orders: Number of Orders received by the Store on the given Day
- Sales: Total Sale for the Store on the given Day
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from ludwig.api import LudwigModel | |
import requests | |
import yaml | |
# URL of the raw YAML file in the GitHub repository | |
url = 'https://raw.githubusercontent.com/john-adeojo/womartdata/main/ludwig_model.yaml' | |
# Send a GET request to the URL | |
response = requests.get(url) |
Store ID | Order_sequence | Order_label_04_01_2023 | Order_label_05_01_2023 |
---|---|---|---|
1 | 20 4 5 | 5 | 5 |
2 | 2 3 4 | 5 | 3 |
Store ID | Orders | Date |
---|---|---|
1 | 20 | 01/01/2023 |
1 | 4 | 02/01/2023 |
1 | 5 | 03/01/2023 |
1 | 5 | 04/01/2023 |
1 | 5 | 05/01/2023 |
2 | 2 | 01/01/2023 |
2 | 3 | 02/01/2023 |
2 | 4 | 03/01/2023 |
Stage | Description |
---|---|
Raw | Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case |
Intermediate | Optional data model(s), which are introduced to type your raw data model(s), e.g. converting string based values into their current typed representation |
Primary | Domain specific data model(s) containing cleansed, transformed and wrangled data from either raw or intermediate, which forms your layer that you input into your feature engineering |
Feature | Analytics specific data model(s) containing a set of features defined against the primary data, which are grouped by feature area of analysis and stored against a common dimension |
Model input | Analytics specific data model(s) containing all feature data against a common dim |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
from umap import UMAP | |
from hdbscan import HDBSCAN | |
import plotly.express as px | |
import plotly.graph_objects as go | |
import gower | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from scripts.clustering.cluster import ClusterAnalysis | |
# Run cluster analysis | |
ca = ClusterAnalysis(influence_metrics_final, n_neighbors=5, min_cluster_size=5, min_dist=0.09, metric='euclidean') | |
ca.run() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
from umap import UMAP | |
from hdbscan import HDBSCAN | |
import plotly.express as px | |
import plotly.graph_objects as go | |
class ClusterAnalysis: | |
def __init__(self, dataframe, n_neighbors=15, min_cluster_size=5, min_dist=0.1, metric='euclidean'): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# sentiment analyser, specify model | |
analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-sentiment-latest') | |
# Get sentiment analysis | |
tweets_with_sentiment = analyzer.get_sentiment(tweets_df) | |
# emotion analyser, specify model | |
analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-emotion', emotion=True) | |
# Get emotion analysis | |
tweets_with_sentiment = analyzer.get_sentiment(tweets_with_sentiment) |