To test long context LLM understanding of academic materials running locally on <= 24GB VRAM.
I downloaded a complex ~450 page Ph.D. dissertation PDF, converted it to text, and prompted two LLMs to generate some summaries. Exact versions of llama.cpp
and GGUFs used for inference are listed below. All tests performed locally on 3090TI w/ 24GB VRAM. Both models support ~128k context in their respective tokenization formats.
- Mistral-Nemo-12B-Instruct-2407
- Tokenizes document into 51617 tokens
- Not really full support for explicit system prompt.