모든 파일은 메인 브랜치에 직접 저장한다. Git worktree를 생성하지 않는다.
A personal knowledge base for AI/deep learning papers in biology research. Follows Karpathy's LLM Wiki pattern: Original PDF → LLM markdown summary (sources) → Structured wiki page (wiki).
Language policy: All wiki content is in English. Conversation with Claude can be in Korean or English.
이 프로젝트의 목적은 논문 내용을 바탕으로 지식을 축적하는 것이다.
- 답변은 위키(sources/, wiki/)에 있는 논문 내용만을 근거로 한다.
- 웹 검색(WebSearch, WebFetch)을 사용하지 않는다. 위키에 없는 정보를 보충하기 위해 웹 검색을 하지 않는다.
- 위키 내용이 불충분하면, 해당 논문의 원본 PDF를 읽어서 보충할 수 있다 (Bash + opendataloader-pdf 사용).
- 위키에 해당 주제의 논문이 아예 없으면, 없다고 말하고 사용자에게 PDF를 요청한다.
- Overview 페이지 작성 시에도 위키 내 논문들만을 출처로 사용한다.
- When creating a new interactive, always add an entry to
interactives/index.html. - Entry format: list item with title, date, and a short description. No icons.
- All interactive pages use white background (
background: #fff). - Interactive files go in
interactives/{topic-name}/subdirectories.
- Source markdown: 1,054 files (
sources/) - Wiki pages: 1,421 files across 25 categories (
wiki/)
llm-wiki/
├── CLAUDE.md # This file — schema, workflow, usage
├── index.md # Full page catalog (category + key papers)
├── log.md # Work log
├── scripts/ # Ingest & build scripts
├── papers/ # Original PDF files (canonical storage)
│ └── {author}-{year}-{title-5-words}.pdf
├── sources/ # PDF summaries (all English)
│ └── {author}-{year}-{title-5-words}.md
├── interactives/ # Interactive HTML visualizations
│ └── {topic-name}/
└── wiki/ # Structured wiki pages (all English)
└── {category}/
모든 논문 관련 파일(PDF, source markdown, wiki markdown)은 동일한 규칙을 따른다:
{1저자 성}-{연도}-{제목 첫 5단어를 -로 연결}.확장자
세부 규칙:
- 1저자 성(last name)만 사용, 소문자, 특수문자 제거
- 연도는 4자리
- 제목에서 첫 5단어, 소문자, 특수문자 제거, 띄어쓰기는
- - 컨소시엄 논문은 컨소시엄 이름 사용 (예:
1000-genomes-project-2015-...)
연구실 논문은 source_collection과 status 필드로 상태를 구분한다:
| 상태 | source_collection |
status |
설명 |
|---|---|---|---|
| 게재 완료 | lab-papers |
published |
이미 저널에 게재된 연구실 논문 |
| 심사중 | our-manuscript |
under-review |
저널에 투고하여 심사 중인 논문 |
| 작성중 | our-manuscript |
in-preparation |
아직 투고 전인 논문 |
작성중/심사중 논문 (PDF가 없는 경우):
source_collection: our-manuscript
status: in-preparation # 또는 under-review
pdf_path: ""
pdf_filename: ""게재 완료 논문:
source_collection: lab-papers
status: published참고: anlab은 연구실 읽기 목록(외부 논문)에 사용하는 별도 collection이다. 연구실이 저자인 논문에는 사용하지 않는다.
원칙: 모든 PDF는 papers/ 폴더에 실제 파일로 저장한다.
- 사용자가 외부 경로에서 PDF를 제공하면, 반드시
cp명령으로papers/폴더에 복사한다. symlink를 만들지 않는다. - 파일명은 반드시 위 파일명 규칙을 따른다.
pdf_path는 항상papers/내 절대 경로를 가리켜야 한다.pdf_filename은pdf_path의 basename과 반드시 일치해야 한다.- 외부 경로(~/Downloads/, ~/Desktop/ 등)를
pdf_path에 절대 넣지 않는다.
Step 1 — Copy PDF to papers/ and extract text:
opendataloader-pdf를 사용한다 (Java 필요). 실패 시 pypdf로 폴백.
export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"
python3 -c "
import opendataloader_pdf, tempfile, os, re, sys
pdf = sys.argv[1]
with tempfile.TemporaryDirectory() as d:
opendataloader_pdf.convert(pdf, output_dir=d, format='markdown', pages='1-15', image_output='off', quiet=True)
stem = os.path.splitext(os.path.basename(pdf))[0]
text = open(f'{d}/{stem}.md').read()
lines = [l for l in text.splitlines() if not re.match(r'!\[image \d+\]', l)]
print('\n'.join(lines)[:12000])
" "/path/to/paper.pdf"pypdf 폴백:
python3 -c "
import pypdf, sys
reader = pypdf.PdfReader(sys.argv[1])
text = ''
for page in reader.pages[:40]:
t = page.extract_text()
if t: text += t + '\n'
if len(text) > 12000: break
print(text[:12000])
" "/path/to/paper.pdf"Step 2 — Write source file to sources/{filename}.md
Step 3 — Create wiki page at wiki/{category}/{filename}.md
Step 4 — Update index.md
---
title: "Paper Title"
authors: Author List
year: YYYY
doi: DOI
category: category_name
pdf_path: /full/path/to/paper.pdf
pdf_filename: filename.pdf
source_collection: collection_name
---
## One-line Summary
## 1. Document Information
## 2. Key Contributions
## 3. Methodology and Architecture
## 4. Key Results and Benchmarks
## 5. Limitations and Future Work
## 6. Related Work
## 7. Glossary---
title: "Exact English Title"
authors: Author list
year: YYYY
doi: DOI
source: source_filename.md
category: category_name
pdf_path: /full/path.pdf
pdf_filename: filename.pdf
source_collection: collection_name
tags: []
---
## Summary
## Key Contributions
## Methodology and Architecture
## Results
## Related Papers
- [[category/page]] — relationship---
title: "Topic Title"
tags: [relevant-tags]
---
## Overview
## Timeline / Comparison Table
## Related Pages| Category | Includes |
|---|---|
genomic-dl |
DNA LMs, variant effect prediction, regulatory genomics, sequence models |
single-cell-dl |
scRNA-seq DL, cell type annotation, integration, imputation, perturbation |
single-cell-foundation |
Geneformer, scGPT, virtual cells, large single-cell foundation models |
single-cell-methylation |
Single-cell DNA methylation analysis, epigenomic profiling |
protein-ai |
Protein LMs, structure prediction, PTM prediction |
gwas |
GWAS, common/rare variant methods, population genetics, LD, variant interpretation |
neuroscience |
ASD genetics, schizophrenia genetics, psychiatric genetics, disease gene functional studies |
brain-development |
Normal brain development, cortical biology, cerebral organoid methodology, neurogenesis |
brain-atlas |
Brain cell atlases, BICCN, spatial transcriptomics |
organoid |
Non-brain organoids: lung, kidney, liver, heart, gut, retinal; iPSC differentiation |
long-read |
PacBio, Oxford Nanopore, long-read DNA sequencing methods |
lrRNA |
Long-read RNA-seq: Iso-seq, MAS-seq, ONT cDNA/dRNA, transcript isoforms |
drug-resistance |
Cancer proteogenomics, drug resistance, cancer genomics, immunotherapy |
methylation-ai |
DNA methylation AI, epigenetic clocks |
methylation |
General DNA methylation biology |
statistics |
Statistical methods: FDR, rare variants, batch effects, Bayesian |
medical-llm |
Medical/clinical LLMs, NLP for EHR, clinical NLP |
sex-differences-biology |
Sex-specific genetic architecture, XWAS, sex-biased disease, X-inactivation |
reproductive-biology |
Germline development, PGC reprogramming, meiotic recombination, genomic imprinting |
meiosis |
Meiotic recombination, crossover mechanisms, synaptonemal complex |
synapse-evolution |
Synapse molecular evolution, postsynaptic density, comparative synaptomics |
aging |
Longevity genetics, lifespan QTL, aging biology |
other |
Cross-cutting, evolution, networks, benchmarks, misc |
concepts |
Key concepts, methods, algorithms explained |
overviews |
Synthesis pages, timelines, comparison tables |
- 3-tier architecture: Raw PDF (immutable) → sources/.md (summaries) → wiki/**/.md (final)
- Single wiki: All AI for Biology in one wiki, organized by category
- Obsidian compatible:
[[wikilinks]], standard markdown - English only: All content in English (for paper writing + RAG)
- PDF extraction: Bash + opendataloader-pdf (NOT Claude Read tool). Fallback to pypdf.
- Consistent YAML: Every source file has title, authors, year, doi, category, pdf_path, pdf_filename, source_collection
- Knowledge compounding: Answers to queries saved as overviews/ pages