Skip to content

Instantly share code, notes, and snippets.

@jerryzh168
Created August 9, 2024 16:16
Show Gist options
  • Save jerryzh168/76f5e0aeeb96229367b10fa1ebbc0dd8 to your computer and use it in GitHub Desktop.
Save jerryzh168/76f5e0aeeb96229367b10fa1ebbc0dd8 to your computer and use it in GitHub Desktop.
cache: {(<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([2, 256]), torch.Size([1152, 256]), torch.Size([1152]), torch.bfloat16): 0.014147199876606464, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([2, 256]), torch.Size([1152, 256]), torch.Size([1152]), torch.bfloat16): 0.017664000391960144, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([2, 256]), torch.Size([1152, 256]), torch.Size([1152]), torch.bfloat16): 0.012953600008040666, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([2, 256]), torch.Size([1152, 256]), torch.Size([1152]), torch.bfloat16): 0.013567999936640263, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([2, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.017347200028598308, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([2, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.019680000841617584, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([2, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.015446400083601475, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([2, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.018144000321626663, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([2, 1152]), torch.Size([6912, 1152]), torch.Size([6912]), torch.bfloat16): 0.0284895995631814, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([2, 1152]), torch.Size([6912, 1152]), torch.Size([6912]), torch.bfloat16): 0.02287999950349331, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([2, 1152]), torch.Size([6912, 1152]), torch.Size([6912]), torch.bfloat16): 0.025190400145947936, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([2, 1152]), torch.Size([6912, 1152]), torch.Size([6912]), torch.bfloat16): 0.03737920122221112, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([600, 4096]), torch.Size([1152, 4096]), torch.Size([1152]), torch.bfloat16): 0.04899200052022934, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([600, 4096]), torch.Size([1152, 4096]), torch.Size([1152]), torch.bfloat16): 0.11779200285673141, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([600, 4096]), torch.Size([1152, 4096]), torch.Size([1152]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([600, 4096]), torch.Size([1152, 4096]), torch.Size([1152]), torch.bfloat16): 0.08584239836782218, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([600, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.024159999564290047, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([600, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.05593600124120712, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([600, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([600, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.040407998766750094, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([8192, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.11851199939846993, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.1762239933013916, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([8192, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([1152, 1152]), torch.Size([1152]), torch.bfloat16): 0.14186751849949358, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([8192, 1152]), torch.Size([4608, 1152]), torch.Size([4608]), torch.bfloat16): 0.40998398661613467, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([4608, 1152]), torch.Size([4608]), torch.bfloat16): 0.5067200064659119, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([8192, 1152]), torch.Size([4608, 1152]), torch.Size([4608]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([4608, 1152]), torch.Size([4608]), torch.bfloat16): 0.42827264964580536, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([8192, 4608]), torch.Size([1152, 4608]), torch.Size([1152]), torch.bfloat16): 0.40392961204051975, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([8192, 4608]), torch.Size([1152, 4608]), torch.Size([1152]), torch.bfloat16): 0.42263040840625765, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([8192, 4608]), torch.Size([1152, 4608]), torch.Size([1152]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([8192, 4608]), torch.Size([1152, 4608]), torch.Size([1152]), torch.bfloat16): 0.3295151995122433, (<class 'torchao.quantization.autoquant.AQFloatLinearWeight'>, torch.Size([8192, 1152]), torch.Size([32, 1152]), torch.Size([32]), torch.bfloat16): 0.03182080015540123, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([32, 1152]), torch.Size([32]), torch.bfloat16): 0.056992001831531525, (<class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight2'>, torch.Size([8192, 1152]), torch.Size([32, 1152]), torch.Size([32]), torch.bfloat16): inf, (<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>, torch.Size([8192, 1152]), torch.Size([32, 1152]), torch.Size([32]), torch.bfloat16): 0.06339391991496086}
| ckpt_id | batch_size | fuse | compile | quantization | sparsify | memory | time |
|:--------------------------------------:|-------------:|:------:|:---------:|:--------------:|:----------:|---------:|-------:|
| PixArt-alpha/PixArt-Sigma-XL-2-1024-MS | 1 | False | True | autoquant | False | 10.041 | 2.232 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment