Skip to content

Instantly share code, notes, and snippets.

View janduplessis883's full-sized avatar

Jan du Plessis janduplessis883

  • London
  • 16:59 (UTC)
View GitHub Profile
@janduplessis883
janduplessis883 / dataset.py
Created November 11, 2025 13:56
SetFit Synthetic Dataset
# Healthcare Feedback Dataset for SetFit
# 90 examples (15 per class) - suitable for few-shot learning
# Optimized label names for ML
LABEL_MAPPING = {
0: "access_availability",
1: "information_provision",
2: "privacy_confidentiality",
3: "continuity_care",
4: "clinical_communication",
import spacy
from spacy.matcher import Matcher
import math
import pandas as pd
# ============================================================================
# Installation Requirements:
# 1. pip install spacy
# 2. python -m spacy download en_core_web_sm
# ============================================================================
@janduplessis883
janduplessis883 / data.py
Created November 5, 2025 20:59
Project Noema
import math
import os
import re
from nltk import pos_tag, word_tokenize
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import pairwise_distances
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
@janduplessis883
janduplessis883 / custom_tools.py
Created January 7, 2025 02:14
crewAI Notion Integration Tools
import toml
from crewai_tools import BaseTool
from typing import ClassVar, Union, Dict, Any, List
import requests
# Load the TOML file
with open("notioncrew/config_secrets.toml", "r") as f:
config_secrets = toml.load(f)
# Load environment variables from streamlit secrets
@janduplessis883
janduplessis883 / Association Rule Mining in Python Tutorial.ipynb
Last active May 8, 2024 14:26
Association Rule Mining in Python Tutorial
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@janduplessis883
janduplessis883 / DataPreprocessingTool.py
Created May 6, 2024 21:16 — forked from Cdaprod/DataPreprocessingTool.py
Langchain tool for preprocessing text data. Version one million nine-hundred and fifty two 😂 jk version 1
import spacy
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from langchain.tools import BaseTool
from typing import Optional, Union, List
from langchain.callbacks.manager import CallbackManagerForToolRun, AsyncCallbackManagerForToolRun
class DataPreprocessingTool(BaseTool):
name = "DataPreprocessingTool"
description = "A tool for preprocessing and structuring unstructured data."
@janduplessis883
janduplessis883 / README.txt
Last active May 2, 2024 02:51
Pinecone Preprocessing Data for Vector Database
In this walkthrough we will see how to use Pinecone for semantic search.
@janduplessis883
janduplessis883 / 01_Embedding_Data_From_A_Pandas_DataFrame_Chroma_LangChain_Ollama.py
Last active July 16, 2025 00:18
Embedding Data from a Pandas DataFrame into a Chroma Vector Database using LangChain and Ollama
import pandas as pd
from langchain.schema import Document
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from tqdm import tqdm
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.