Skip to content

Instantly share code, notes, and snippets.

View masta-g3's full-sized avatar

masta-g3 masta-g3

View GitHub Profile
@masta-g3
masta-g3 / md_splitter
Created January 11, 2025 01:22
split markdown by headers
def split_markdown_document(text, target_min=500, target_max=700):
# Identify headers and their levels
header_pattern = re.compile(r'^(#+)\s+(.*)', re.MULTILINE)
headers = [(m.start(), len(m.group(1)), m.group(2)) for m in re.finditer(header_pattern, text)]
# Add sentinel header at the end
headers.append((len(text), 0, ""))
# If no real headers found (other than sentinel), treat entire doc as one big chunk
real_headers = [h for h in headers if h[1] > 0]
if not real_headers:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@masta-g3
masta-g3 / notes
Last active February 7, 2024 17:12
summary notes and more
%load_ext autoreload
%autoreload 2
# Summarizer
from langchain.chat_models import ChatOpenAI, AzureChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.document_loaders import ArxivLoader
@masta-g3
masta-g3 / llm_queue.txt
Last active April 25, 2025 02:51
Updated LLM queue.
2402.20315
3707.84
2405.46012
@masta-g3
masta-g3 / llm_papers.txt
Last active April 25, 2025 03:02
Updated 2025-04-24
Cedille: A large autoregressive French language model
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks
Query2doc: Query Expansion with Large Language Models
The Internal State of an LLM Knows When its Lying
Structured information extraction from complex scientific text with fine-tuned large language models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Large Language Models Encode Clinical Knowledge
PoET: A generative model of protein families as sequences-of-sequences
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
tz1ZWje5jpfbWrG5iuxz7qUx7eBLLAS5Tv3z,0x10,all
tz1eZv3NVo44c2LJ78pKUKabFUZABemX3QMN,256x256,all
tz1Sqhrj49cW9aTuwQrDa4mycnA7n5dDd7rF,abosch,all
tz1ZBMhTa7gxSpaeXoqyc6bTCrxEHfZYSpPt,aebrer,all
tz1h3Cz9aMM6aiNbfPhybev6o9KCfY8geaPd,alexandrajovanic,all
tz1St3n29AbYXZXV8W1BG41qYzz86J2CFAW7,alexthescott,all
tz2DFJE6jVMobNWzJMXtJG2YXJZUgE5JxRC8,also celador,all
tz1XQBiuYpkx3s9nF49D9HE2pbC3DUNnKZHJ,amit pitaru,all
tz1Z2kdYtxZ6YAYyCTYZUmEwvXJbdjN7mdzs,amy goodchild,all
tz1Roq6end2LFtkpGrmuyRZH82xsWfaRCat1,andreasrau,all
tz1ZWje5jpfbWrG5iuxz7qUx7eBLLAS5Tv3z # 0x10
tz1eZv3NVo44c2LJ78pKUKabFUZABemX3QMN # 256x256
tz1Sqhrj49cW9aTuwQrDa4mycnA7n5dDd7rF # abosch
tz1ZBMhTa7gxSpaeXoqyc6bTCrxEHfZYSpPt # aebrer
tz1eYu9mJo9P4NXogtcisVnjGGnPp4pb4tTm # aj amos
tz1h3Cz9aMM6aiNbfPhybev6o9KCfY8geaPd # alexandrajovanic
tz1St3n29AbYXZXV8W1BG41qYzz86J2CFAW7 # alexthescott
tz2DFJE6jVMobNWzJMXtJG2YXJZUgE5JxRC8 # also celador
tz1XQBiuYpkx3s9nF49D9HE2pbC3DUNnKZHJ # amit pitaru
tz1Z2kdYtxZ6YAYyCTYZUmEwvXJbdjN7mdzs # amy goodchild