Last active
September 18, 2024 04:32
-
-
Save tail-call/a602fde6be9eb9097827dacd00a11dd5 to your computer and use it in GitHub Desktop.
Studying "Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward": adapting a table for plotting
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Method | Quantization Type | WM (GB) | RM (GB) | Tokens/sec | Perplexity | NVIDIA GPU | AMD GPU | Apple Silicon | CPU | Intel GPU | AWS Inferentia2 | WebGPU | WASM | Adreno Mali | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Llama.cpp | GGUF K-Quant 2bit | 2.36 | 3.69 | 102.15 | 6.96 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
1 | Llama.cpp | GGUF 4bit (check) | 3.56 | 4.88 | 128.97 | 5.96 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
2 | Llama.cpp | GGUF AWQ 4bit | 3.56 | 4.88 | 129.25 | 5.91 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
3 | Llama.cpp | GGUF K-Quant 4bit | 3.59 | 4.90 | 109.72 | 5.87 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
4 | Llama.cpp | GGUF 8bit | 6.67 | 7.78 | 93.39 | 5.79 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
5 | Llama.cpp | GGUF FP16 | 12.55 | 13.22 | 66.81 | 5.79 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
6 | ExLlama | GPTQ 4bit | 3.63 | 5.35 | 77.10 | 6.08 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
8 | ExLlamav2 | EXL2 2bit | 2.01 | 5.21 | 153.75 | 20.21 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
9 | ExLlamav2 | EXL2 4bit | 3.36 | 6.61 | 131.68 | 6.12 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
10 | ExLlamav2 | GPTQ 4bit | 3.63 | 6.93 | 151.30 | 6.03 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
11 | ExLlamav2 | EXL2 8bit | 6.37 | 9.47 | 115.81 | 5.76 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
12 | ExLlamav2 | FP16 | 12.55 | 15.09 | 67.70 | 5.73 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
13 | vLLM | AWQ GEMM 4bit | 3.62 | 34.55 | 114.43 | 6.02 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
14 | vLLM | GPTQ 4bit | 3.63 | 36.51 | 172.88 | 6.08 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
15 | vLLM | FP16 | 12.55 | 35.92 | 79.74 | 5.85 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
16 | TensorRT-LLM | AWQ GEMM 4bit | 3.42 | 5.69 | 194.86 | 6.02 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
17 | TensorRT-LLM | GPTQ 4bit | 3.60 | 5.88 | 202.16 | 6.08 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
18 | TensorRT-LLM | INT8 | 6.53 | 8.55 | 143.57 | 5.89 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
19 | TensorRT-LLM | FP16 | 12.55 | 14.61 | 83.43 | 5.85 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
20 | TGI | AWQ GEMM 4bit | 3.62 | 7.97 | 30.80 | 6.02 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
21 | TGI | AWQ GEMV 4bit | 3.62 | 7.96 | 34.22 | 6.02 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
22 | TGI | GPTQ 4bit | 3.69 | 39.39 | 34.86 | 6.08 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
23 | TGI | FP4 | 12.55 | 17.02 | 34.38 | 6.15 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
24 | TGI | NF4 | 12.55 | 17.02 | 33.93 | 6.02 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
25 | TGI | INT8 | 12.55 | 11.66 | 5.39 | 5.89 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
26 | TGI | FP16 | 12.55 | 17.02 | 34.23 | 5.85 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
27 | MLC-LLM | OmniQuant 3bit | 3.2 | 5.1 | 83.4 | 6.65 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
28 | MLC-LLM | OmniQuant 4bit | 3.8 | 5.7 | 134.2 | 5.97 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
29 | MLC-LLM | AWQ GEMM 4bit | 3.62 | 6.50 | 23.62 | 6.02 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
30 | MLC-LLM | Q4F16 | 3.53 | 6.50 | 189.07 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | ||
31 | MLC-LLM | Q3F16 | 2.84 | 5.98 | 185.47 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | ||
32 | MLC-LLM | FP16 | 12.55 | 15.38 | 87.37 | 5.85 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://arxiv.org/html/2402.01799v1