Skip to content

Instantly share code, notes, and snippets.

View wesslen's full-sized avatar

Ryan Wesslen wesslen

View GitHub Profile
@wesslen
wesslen / bac-10k-riskfactors.txt
Created March 7, 2024 23:00
bac-10k-riskfactors.txt
The discussion below addresses our material risk factors of which we are aware. Any risk factor, either by itself or together with other risk factors, could materially and adversely affect our businesses, results of operations, cash flows and/or financial condition. References to third parties may include suppliers, service providers, counterparties, financial market utilities, exchanges and clearing houses, data aggregators and other partners and their upstream and downstream service providers (e.g., fourth parties, fifth parties) who may also contribute to our risks. Other factors not currently known to us or that we currently deem immaterial could also adversely affect our businesses, results of operations, cash flows and/or financial condition. Therefore, the risk factors below should not be considered all of the potential risks that we may face. For more information on how we manage risks, see Managing Risk in the MD&A beginning on page 44. For more information about the risks contained in this section,
@wesslen
wesslen / ibis-examples.ipynb
Last active May 24, 2024 14:18
dsba6211-summer2024-lab1-solutions.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@wesslen
wesslen / duplicates.py
Created February 15, 2024 18:51
Check for duplicate inputs (text) for .jsonl files
import json
import logging
import typer
from pathlib import Path
from typing import List
from prodigy import set_hashes
logging.basicConfig(format='%(message)s', level=logging.INFO)
def process_file(file_path: Path, all_hashes: dict) -> dict:
# vs code create folder
# check python versions
python -v # make sure 3.9-3.12
pip -v
# create venv
python -m virtualenv venv
source venv/bin/activate
@wesslen
wesslen / create_data.py
Last active October 17, 2023 03:01
GPT3.5 fine tuning dummy example
import os
import json
import typer
app = typer.Typer()
def process_scripts(input_prompts_file: str, scripts_folder: str, output_file: str):
output_data = []
# Read input prompts from the .jsonl file
@wesslen
wesslen / chat1.txt
Last active July 11, 2023 22:34
Python script that processes txt transcript files and outputs .jsonl for Prodigy annotations
AGENT hello this is steve can i ask what's your name
CUSTOMER my name is harry
AGENT thanks harry. how can i help you
@wesslen
wesslen / tokenizer_spacy_german.ipynb
Last active July 7, 2023 19:41
Custom (German) Tokenizer in a SpaCy pipeline
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@wesslen
wesslen / mapping.py
Last active December 24, 2023 16:28
Python function to map random user ID's (e.g., crowdsource platform) to a unique Prodigy session name
import random
import string
def generate_random_userids(num_ids):
passwords = []
for _ in range(num_ids):
password = ''.join(random.choices(string.ascii_letters + string.digits, k=12))
passwords.append(password)
return passwords
@wesslen
wesslen / prodigy-hf-space.py
Created June 26, 2023 23:06
Create new Prodigy instance on HF Space
from huggingface_hub import HfApi
from huggingface_hub import duplicate_space
from huggingface_hub import hf_hub_download
from dotenv import load_dotenv
import os
# create a local .env file with HF_TOKEN (HF Hub Token)
load_dotenv()
HF_TOKEN = os.environ.get("HF_TOKEN")
@wesslen
wesslen / custom.js
Last active July 5, 2023 19:37
Prodigy hierarchical text classification
function toggle(id) {
var x = document.getElementById(id);
if (id == "a"){
reset("b")
}else{
reset("a")
}
if (x.style.display === "none") {
x.style.display = "block";
} else {