masta-g3 masta-g3

Quant fish living in SF.

masta-g3 / md_splitter

Created January 11, 2025 01:22

split markdown by headers

	def split_markdown_document(text, target_min=500, target_max=700):
	# Identify headers and their levels
	header_pattern = re.compile(r'^(#+)\s+(.*)', re.MULTILINE)
	headers = [(m.start(), len(m.group(1)), m.group(2)) for m in re.finditer(header_pattern, text)]
	# Add sentinel header at the end
	headers.append((len(text), 0, ""))

	# If no real headers found (other than sentinel), treat entire doc as one big chunk
	real_headers = [h for h in headers if h[1] > 0]
	if not real_headers:

masta-g3 / 20240322_summary_placement.ipynb

Last active March 22, 2024 22:18

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

masta-g3 / notes

Last active February 7, 2024 17:12

summary notes and more

masta-g3 / summarize_by_parts.py

Created December 1, 2023 02:02


	%load_ext autoreload
	%autoreload 2

	# Summarizer

	from langchain.chat_models import ChatOpenAI, AzureChatOpenAI
	from langchain.text_splitter import RecursiveCharacterTextSplitter
	from langchain.prompts import ChatPromptTemplate
	from langchain.document_loaders import ArxivLoader

masta-g3 / llm_queue.txt

Last active January 1, 2026 23:37

Updated LLM queue.

masta-g3 / llm_papers.txt

Last active January 2, 2026 01:43

Updated 2026-01-01

This file has been truncated, but you can view the full file.

	Cedille: A large autoregressive French language model
	The Wisdom of Hindsight Makes Language Models Better Instruction Followers
	ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks
	Query2doc: Query Expansion with Large Language Models
	The Internal State of an LLM Knows When its Lying
	Structured information extraction from complex scientific text with fine-tuned large language models
	TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
	Large Language Models Encode Clinical Knowledge
	PoET: A generative model of protein families as sequences-of-sequences
	Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

masta-g3 / gist:74b2575f874851a63671f392e646c72e

Last active January 17, 2023 07:09

lv_follow.txt

	tz1ZWje5jpfbWrG5iuxz7qUx7eBLLAS5Tv3z,0x10,all
	tz1eZv3NVo44c2LJ78pKUKabFUZABemX3QMN,256x256,all
	tz1Sqhrj49cW9aTuwQrDa4mycnA7n5dDd7rF,abosch,all
	tz1ZBMhTa7gxSpaeXoqyc6bTCrxEHfZYSpPt,aebrer,all
	tz1h3Cz9aMM6aiNbfPhybev6o9KCfY8geaPd,alexandrajovanic,all
	tz1St3n29AbYXZXV8W1BG41qYzz86J2CFAW7,alexthescott,all
	tz2DFJE6jVMobNWzJMXtJG2YXJZUgE5JxRC8,also celador,all
	tz1XQBiuYpkx3s9nF49D9HE2pbC3DUNnKZHJ,amit pitaru,all
	tz1Z2kdYtxZ6YAYyCTYZUmEwvXJbdjN7mdzs,amy goodchild,all
	tz1Roq6end2LFtkpGrmuyRZH82xsWfaRCat1,andreasrau,all

masta-g3 / follow.txt

Last active July 18, 2022 05:55

	tz1ZWje5jpfbWrG5iuxz7qUx7eBLLAS5Tv3z # 0x10
	tz1eZv3NVo44c2LJ78pKUKabFUZABemX3QMN # 256x256
	tz1Sqhrj49cW9aTuwQrDa4mycnA7n5dDd7rF # abosch
	tz1ZBMhTa7gxSpaeXoqyc6bTCrxEHfZYSpPt # aebrer
	tz1eYu9mJo9P4NXogtcisVnjGGnPp4pb4tTm # aj amos
	tz1h3Cz9aMM6aiNbfPhybev6o9KCfY8geaPd # alexandrajovanic
	tz1St3n29AbYXZXV8W1BG41qYzz86J2CFAW7 # alexthescott
	tz2DFJE6jVMobNWzJMXtJG2YXJZUgE5JxRC8 # also celador
	tz1XQBiuYpkx3s9nF49D9HE2pbC3DUNnKZHJ # amit pitaru
	tz1Z2kdYtxZ6YAYyCTYZUmEwvXJbdjN7mdzs # amy goodchild