Skip to content

Instantly share code, notes, and snippets.

View fearnworks's full-sized avatar

fearnworks fearnworks

View GitHub Profile
@fearnworks
fearnworks / dwg_4_10.md
Created April 10, 2025 11:48
Data Working Group Update 4/10/25

OMI All Hands – DWG Update – 04/10/2025

OMI Data Pipeline

  • We could use front end developers with experience with js/ts. We are utilizing svelte 5 / sveltekit, but prior experience with the framework is not a requirement. DWG members believe bar for entry should be fairly low, mostly just needs folks who want to contribute dev time!
  • Cheezy has put in a good deal of work into updating the getting started docs to help with onboarding new folks
  • Jimmy working on migrations for Svelte 5

Merged

  • #186 : Keep additional metadata information
  • #189 : Fix issue with div covering home link and update playwright test with new locator

OMI All Hands – DWG Update – 03/27/2025

OMI Data Pipeline

  • Discussion around central repo ux with kent. Cheezy has begun refactor of front page
  • Jimmy returns! Was without internet for a few weeks.
  • Several PR's out there that need review.
  • Could still use more help on central repo.
  • Dr. Head providing some cool resizing utils for the pipelines, currently working on integrating

Merged

CDK Stack Documentation

Overview

This document provides an introduction to our AWS Cloud Development Kit (CDK) stack. It explains the various elements that make up our cloud infrastructure, the GitHub Actions workflow used to deploy it, and the key AWS services that are configured.

GitHub Actions Workflow

@fearnworks
fearnworks / grpo_demo.py
Created February 17, 2025 15:52 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
import gradio as gr
import transformers
from torch import bfloat16
# from dotenv import load_dotenv # if you wanted to adapt this for a repo that uses auth
from threading import Thread
from gradio.themes.utils.colors import Color
#HF_AUTH = os.getenv('HF_AUTH')
#model_id = "stabilityai/StableBeluga2" # 70B parm model based off Llama 2 70B
@fearnworks
fearnworks / codellama_auth_gen.py
Created August 31, 2023 00:26
Generated by code llama
from passlib.context import CryptContext
from jose import jwt
import datetime
SECRET_KEY = "YOUR-SECRET-KEY" # Replace this with your secret key
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
class User:
@fearnworks
fearnworks / arxiv_dowloader_open_ai_func.py
Created June 15, 2023 00:58
This file does a semantic search utilizing nlp and the new open ai func api to find answers to user queries
import arxiv
import ast
import concurrent
from csv import writer
import openai
import os
import pandas as pd
from PyPDF2 import PdfReader
import requests
from scipy import spatial
OUTPUT :
0.0:
Summary:
{
"subject": "a man",
"characters": ["a man", "a silhouetted figure", "an old friend"],
"locations": ["a dark alley in the heart of the city"],
@fearnworks
fearnworks / falcoln_7b_qlora_axolotl.yml
Created June 2, 2023 20:57
Working falcon 7b qlora w/ Axolotl
base_model: tiiuae/falcon-7b
base_model_config: tiiuae/falcon-7b
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
gptq: false
strict: false
push_dataset_to_hub:
@fearnworks
fearnworks / bug.md
Last active June 2, 2023 17:54
Falcon Qlora 7b bug

workspace/llm-playground/notebooks/axolotl/runpod/axolotl-falcon-7b-qlora-gsm8k.ipynb

Steps to reproduce :

1 ) Copy config from #4 run-16: 40*2 + xformer into examples/falcon/qlora.yml

2 ) Run cells 1 & 2

3 ) Run !accelerate launch scripts/finetune.py examples/falcon/qlora.yml