John Adeojo john-adeojo

Store ID	Order_sequence	Order_label_04_01_2023	Order_label_05_01_2023
1	20 4 5	5	5
2	2 3 4	5	3

Stage	Description
Raw	Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case
Intermediate	Optional data model(s), which are introduced to type your `raw` data model(s), e.g. converting string based values into their current typed representation
Primary	Domain specific data model(s) containing cleansed, transformed and wrangled data from either raw or intermediate, which forms your layer that you input into your feature engineering
Feature	Analytics specific data model(s) containing a set of features defined against the primary data, which are grouped by feature area of analysis and stored against a common dimension
Model input	Analytics specific data model(s) containing all `feature` data against a common dim

	from ludwig.api import LudwigModel
	import requests
	import yaml

	# URL of the raw YAML file in the GitHub repository
	url = 'https://raw.githubusercontent.com/john-adeojo/womartdata/main/ludwig_model.yaml'

	# Send a GET request to the URL
	response = requests.get(url)

	import pandas as pd
	import numpy as np
	from umap import UMAP
	from hdbscan import HDBSCAN
	import plotly.express as px
	import plotly.graph_objects as go
	import gower

	from scripts.clustering.cluster import ClusterAnalysis

	# Run cluster analysis
	ca = ClusterAnalysis(influence_metrics_final, n_neighbors=5, min_cluster_size=5, min_dist=0.09, metric='euclidean')
	ca.run()

	import pandas as pd
	import numpy as np
	from umap import UMAP
	from hdbscan import HDBSCAN
	import plotly.express as px
	import plotly.graph_objects as go


	class ClusterAnalysis:
	def __init__(self, dataframe, n_neighbors=15, min_cluster_size=5, min_dist=0.1, metric='euclidean'):

	# sentiment analyser, specify model
	analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-sentiment-latest')
	# Get sentiment analysis
	tweets_with_sentiment = analyzer.get_sentiment(tweets_df)

	# emotion analyser, specify model
	analyzer = SentimentAnalyzer('cardiffnlp/twitter-roberta-base-emotion', emotion=True)
	# Get emotion analysis
	tweets_with_sentiment = analyzer.get_sentiment(tweets_with_sentiment)

Data Mining Project Template