fearnworks fearnworks

OMI All Hands – DWG Update – 04/10/2025

OMI Data Pipeline

We could use front end developers with experience with js/ts. We are utilizing svelte 5 / sveltekit, but prior experience with the framework is not a requirement. DWG members believe bar for entry should be fairly low, mostly just needs folks who want to contribute dev time!
Cheezy has put in a good deal of work into updating the getting started docs to help with onboarding new folks
Jimmy working on migrations for Svelte 5

Merged

#186 : Keep additional metadata information
#189 : Fix issue with div covering home link and update playwright test with new locator

OMI All Hands – DWG Update – 03/27/2025

OMI Data Pipeline

Discussion around central repo ux with kent. Cheezy has begun refactor of front page
Jimmy returns! Was without internet for a few weeks.
Several PR's out there that need review.
Could still use more help on central repo.
Dr. Head providing some cool resizing utils for the pipelines, currently working on integrating

Merged

CDK Stack Documentation

Overview

This document provides an introduction to our AWS Cloud Development Kit (CDK) stack. It explains the various elements that make up our cloud infrastructure, the GitHub Actions workflow used to deploy it, and the key AWS services that are configured.

GitHub Actions Workflow

workspace/llm-playground/notebooks/axolotl/runpod/axolotl-falcon-7b-qlora-gsm8k.ipynb

Steps to reproduce :

1 ) Copy config from #4 run-16: 40*2 + xformer into examples/falcon/qlora.yml

2 ) Run cells 1 & 2

3 ) Run !accelerate launch scripts/finetune.py examples/falcon/qlora.yml

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	import re
	import torch
	from datasets import load_dataset, Dataset
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import LoraConfig
	from trl import GRPOConfig, GRPOTrainer

	import gradio as gr
	import transformers
	from torch import bfloat16
	# from dotenv import load_dotenv # if you wanted to adapt this for a repo that uses auth
	from threading import Thread
	from gradio.themes.utils.colors import Color


	#HF_AUTH = os.getenv('HF_AUTH')
	#model_id = "stabilityai/StableBeluga2" # 70B parm model based off Llama 2 70B

	from passlib.context import CryptContext
	from jose import jwt
	import datetime

	SECRET_KEY = "YOUR-SECRET-KEY" # Replace this with your secret key
	ALGORITHM = "HS256"
	ACCESS_TOKEN_EXPIRE_MINUTES = 30
	pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

	class User:

	import arxiv
	import ast
	import concurrent
	from csv import writer
	import openai
	import os
	import pandas as pd
	from PyPDF2 import PdfReader
	import requests
	from scipy import spatial

	OUTPUT :

	0.0:

	Summary:

	{
	"subject": "a man",
	"characters": ["a man", "a silhouetted figure", "an old friend"],
	"locations": ["a dark alley in the heart of the city"],

	base_model: tiiuae/falcon-7b
	base_model_config: tiiuae/falcon-7b
	trust_remote_code: true
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	load_in_8bit: false
	load_in_4bit: true
	gptq: false
	strict: false
	push_dataset_to_hub: