Skip to content

Instantly share code, notes, and snippets.

View john-adeojo's full-sized avatar

John Adeojo john-adeojo

View GitHub Profile
from ludwig.api import LudwigModel
import requests
import yaml
# URL of the raw YAML file in the GitHub repository
url = 'https://raw.githubusercontent.com/john-adeojo/womartdata/main/ludwig_model.yaml'
# Send a GET request to the URL
response = requests.get(url)
  • ID: Unique Identifier for a row
  • Store_id: Unique id for each Store
  • Store_Type: Type of the Store
  • Location_Type: Type of the location where Store is located
  • Region_Code: Code of the Region where Store is located
  • Date: Information about the Date
  • Holiday: If there is holiday on the given Date, 1 : Yes, 0 : No
  • Discount: If discount is offered by store on the given Date, Yes/ No
  • Orders: Number of Orders received by the Store on the given Day
  • Sales: Total Sale for the Store on the given Day
Store ID Order_sequence Order_label_04_01_2023 Order_label_05_01_2023
1 20 4 5 5 5
2 2 3 4 5 3
Store ID Orders Date
1 20 01/01/2023
1 4 02/01/2023
1 5 03/01/2023
1 5 04/01/2023
1 5 05/01/2023
2 2 01/01/2023
2 3 02/01/2023
2 4 03/01/2023
Stage Description
Raw Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case
Intermediate Optional data model(s), which are introduced to type your raw data model(s), e.g. converting string based values into their current typed representation
Primary Domain specific data model(s) containing cleansed, transformed and wrangled data from either raw or intermediate, which forms your layer that you input into your feature engineering
Feature Analytics specific data model(s) containing a set of features defined against the primary data, which are grouped by feature area of analysis and stored against a common dimension
Model input Analytics specific data model(s) containing all feature data against a common dim

Data Mining Project Template

1. Introduction

Project Objectives

:

The main objectives of this project are to:

import pandas as pd
import numpy as np
from umap import UMAP
from hdbscan import HDBSCAN
import plotly.express as px
import plotly.graph_objects as go
import gower
from scripts.clustering.cluster import ClusterAnalysis
# Run cluster analysis
ca = ClusterAnalysis(influence_metrics_final, n_neighbors=5, min_cluster_size=5, min_dist=0.09, metric='euclidean')
ca.run()
import pandas as pd
import numpy as np
from umap import UMAP
from hdbscan import HDBSCAN
import plotly.express as px
import plotly.graph_objects as go
class ClusterAnalysis:
def __init__(self, dataframe, n_neighbors=15, min_cluster_size=5, min_dist=0.1, metric='euclidean'):
# sentiment analyser, specify model
analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-sentiment-latest')
# Get sentiment analysis
tweets_with_sentiment = analyzer.get_sentiment(tweets_df)
# emotion analyser, specify model
analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-emotion', emotion=True)
# Get emotion analysis
tweets_with_sentiment = analyzer.get_sentiment(tweets_with_sentiment)