It is glad to see some LLM speed reports online such as CPU and GPU. To give a more comprehensive investigation, this document records some LLM inference measurements on V100 16GB using text-generation-webui.
We test following prompts:
- from dataset Sqaud
How many student news papers are found at Notre Dame?