Skip to content

Instantly share code, notes, and snippets.

@raphlinus
Created February 24, 2020 03:22
Show Gist options
  • Save raphlinus/2fb72c24fb2eed60bbb5da290791950d to your computer and use it in GitHub Desktop.
Save raphlinus/2fb72c24fb2eed60bbb5da290791950d to your computer and use it in GitHub Desktop.
mac results on transpose-timing-tests (git hash 781dcf54fc8f32fa2acf54c7a0261defe09ef1be)
compiling kernel transpose-hybrid-shuffle-WGS=(32,1)...
num bms: 4096, num dispatch groups: 4096
GPU results verified!
task name:Vk-HybridShuffle-TG=32
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 32
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 108.47 +/- 8.75 ms
compiling kernel transpose-hybrid-shuffle-WGS=(64,1)...
num bms: 4096, num dispatch groups: 2048
GPU results verified!
task name:Vk-HybridShuffle-TG=64
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 64
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 108.36 +/- 3.29 ms
compiling kernel transpose-hybrid-shuffle-WGS=(128,1)...
num bms: 4096, num dispatch groups: 1024
GPU results verified!
task name:Vk-HybridShuffle-TG=128
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 128
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 108.86 +/- 1.25 ms
compiling kernel transpose-hybrid-shuffle-WGS=(256,1)...
num bms: 4096, num dispatch groups: 512
GPU results verified!
task name:Vk-HybridShuffle-TG=256
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 256
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 110.48 +/- 1.99 ms
compiling kernel transpose-hybrid-shuffle-WGS=(512,1)...
num bms: 4096, num dispatch groups: 256
GPU results verified!
task name:Vk-HybridShuffle-TG=512
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 512
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 127.18 +/- 0.91 ms
transpose-threadgroup-WGS=(1,32) kernel already compiled...
num bms: 4096, num dispatch groups: 4096
GPU results verified!
task name:Vk-Threadgroup-TG=32
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 32
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 43.87 +/- 5.92 ms
transpose-threadgroup-WGS=(2,32) kernel already compiled...
num bms: 4096, num dispatch groups: 2048
GPU results verified!
task name:Vk-Threadgroup-TG=64
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 64
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 37.82 +/- 3.88 ms
transpose-threadgroup-WGS=(4,32) kernel already compiled...
num bms: 4096, num dispatch groups: 1024
GPU results verified!
task name:Vk-Threadgroup-TG=128
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 128
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 98.63 +/- 24.18 ms
transpose-threadgroup-WGS=(8,32) kernel already compiled...
num bms: 4096, num dispatch groups: 512
GPU results verified!
task name:Vk-Threadgroup-TG=256
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 256
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 211.00 +/- 45.09 ms
transpose-threadgroup-WGS=(16,32) kernel already compiled...
num bms: 4096, num dispatch groups: 256
GPU results verified!
task name:Vk-Threadgroup-TG=512
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 512
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 279.47 +/- 11.42 ms
transpose-threadgroup-WGS=(32,32) kernel already compiled...
num bms: 4096, num dispatch groups: 128
GPU results verified!
task name:Vk-Threadgroup-TG=1024
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 1024
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 248.14 +/- 5.85 ms
compiling kernel transpose-threadgroup-WGS=(64,32)...
num bms: 4096, num dispatch groups: 64
thread 'main' panicked at 'GPU result 0 incorrect!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment