Skip to content

Instantly share code, notes, and snippets.

@slp
Created February 11, 2024 08:27
Show Gist options
  • Save slp/0a0c45c9cc3d490553e4d50472a2b4ad to your computer and use it in GitHub Desktop.
Save slp/0a0c45c9cc3d490553e4d50472a2b4ad to your computer and use it in GitHub Desktop.
llama.cpp Vulkan tests output for M1 GPU
Log start
main: build = 2085 (87db9adf)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed = 1707639415
ggml_vulkan: Using Apple M1 | uma: 1 | fp16: 1 | warp size: 32
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from ../libkrun/examples/models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = tinyllama_tinyllama-1.1b-chat-v1.0
llama_model_loader: - kv 2: llama.context_length u32 = 2048
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
llama_model_loader: - kv 4: llama.block_count u32 = 22
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 2
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type q4_0: 155 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 1.10 B
llm_load_print_meta: model size = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name = tinyllama_tinyllama-1.1b-chat-v1.0
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 2 '</s>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.08 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/23 layers to GPU
llm_load_tensors: CPU buffer size = 606.53 MiB
.......................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: Vulkan_Host KV buffer size = 11.00 MiB
llama_new_context_with_model: KV self size = 11.00 MiB, K (f16): 5.50 MiB, V (f16): 5.50 MiB
llama_new_context_with_model: Vulkan_Host input buffer size = 5.01 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 73.15 MiB
llama_new_context_with_model: graph splits (measure): 1
TEST TRANSFER 32000 KB to_gpu 10.012ms (3121.25 MB/s) from_gpu 5.132ms (6089.24 MB/s) avg_err=0
TEST TRANSFER 32000 KB to_gpu 3.747ms (8340.01 MB/s) from_gpu 2.564ms (12188 MB/s) avg_err=0
TEST DEQUANT q4_0 time=4.338ms avg_err=0.029362
TEST DEQUANT q4_1 time=3.539ms avg_err=0.0146843
TEST DEQUANT q5_0 time=3.512ms avg_err=0.0146824
TEST DEQUANT q5_1 time=3.787ms avg_err=nan
TEST DEQUANT q8_0 time=3.824ms avg_err=0.00185736
TEST DEQUANT q2_K time=4.736ms avg_err=0.0656603
TEST DEQUANT q3_K time=5.643ms avg_err=0.0533272
TEST DEQUANT q4_K time=4.699ms avg_err=0.0142812
TEST DEQUANT q5_K time=5.291ms avg_err=0.00707707
TEST DEQUANT q6_K time=5.534ms avg_err=0.00638328
TEST F16_F32_S m=8 n=8 k=8 batch=2 split_k=1 matmul 0.388ms avg_err=4.00178e-08
TEST F16_F32_M m=8 n=8 k=8 batch=2 split_k=1 matmul 0.228ms avg_err=3.18978e-08
TEST F16_F32_L m=8 n=8 k=8 batch=2 split_k=1 matmul 0.237ms avg_err=3.31784e-08
TEST F16_F32_S m=8 n=8 k=8 batch=2 split_k=4 matmul 0.26ms avg_err=3.91738e-08
TEST F16_F32_M m=8 n=8 k=8 batch=2 split_k=4 matmul 0.278ms avg_err=3.95521e-08
TEST F16_F32_L m=8 n=8 k=8 batch=2 split_k=4 matmul 0.239ms avg_err=3.38769e-08
TEST F16_F32_ALIGNED_S m=100 n=46 k=576 batch=2 split_k=1 matmul 1.444ms avg_err=7.18192
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -5.43 -2.16 13.85 0.85 8.99 13.77 10.13 7.59 -0.70 0.66
28: -10.98 -7.89 1.87 -9.04 -11.27 2.27 -4.02 2.05 6.66 2.09
29: -20.34 2.80 0.68 -14.36 -0.45 3.55 -14.60 2.71 1.05 -8.92
30: -2.53 9.35 8.93 1.15 1.33 -0.95 0.54 -5.64 -9.37 2.19
31: -14.44 4.75 14.22 -2.75 -0.38 -4.26 -2.87 5.62 -0.10 0.83
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -5.43 -2.16 13.85 0.84 8.99 13.77 10.13 7.59 -0.71 0.66
28: -10.98 -7.89 1.87 -9.04 -11.27 2.27 -4.02 2.05 6.67 2.09
29: -20.34 2.80 0.68 -14.35 -0.45 3.55 -14.60 2.71 1.05 -8.91
30: -2.53 9.35 8.92 1.15 1.32 -0.95 0.55 -5.64 -9.37 2.20
31: -14.44 4.75 14.22 -2.76 -0.38 -4.26 -2.87 5.62 -0.10 0.83
32: -4.06 1.49 -0.36 -6.62 2.80 14.22 -1.22 3.55 -18.15 -9.75
33: 1.54 -2.49 7.15 -5.94 -11.49 -0.22 -5.60 5.02 -0.35 -11.82
34: 15.11 22.72 -1.58 -5.01 -14.87 -2.45 -1.36 0.45 -1.21 -16.04
35: 1.08 -14.98 -18.81 1.05 0.49 -1.45 5.73 31.59 -6.31 12.47
36: 3.57 -0.21 23.36 -10.95 -18.27 8.59 5.16 5.38 -9.41 10.60
TEST F16_F32_ALIGNED_M m=100 n=46 k=576 batch=2 split_k=1 matmul 1.114ms avg_err=0.00238037
TEST F16_F32_L m=100 n=46 k=576 batch=2 split_k=1 matmul 1.394ms avg_err=4.74412
m = 64 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
59: -9.13 -8.97 2.46 4.56 -7.07 0.95 3.15 1.22 4.59 1.45
60: 0.95 9.72 -6.72 1.87 2.00 -17.97 5.50 -8.69 3.66 -5.41
61: 0.94 -7.39 8.22 -2.80 -3.14 13.18 -10.19 1.53 10.14 4.40
62: -6.87 -13.28 15.75 -14.88 -3.09 8.90 -4.20 -1.31 -8.89 10.86
63: -0.29 -1.95 -3.36 0.30 16.21 7.13 -4.68 7.55 6.46 -0.49
64: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
65: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
66: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
67: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
68: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
59: -9.13 -8.97 2.46 4.56 -7.07 0.95 3.14 1.22 4.59 1.45
60: 0.95 9.72 -6.72 1.87 2.00 -17.97 5.50 -8.69 3.66 -5.41
61: 0.94 -7.39 8.22 -2.80 -3.14 13.18 -10.19 1.53 10.13 4.40
62: -6.87 -13.28 15.75 -14.88 -3.09 8.90 -4.20 -1.31 -8.89 10.86
63: -0.29 -1.95 -3.36 0.30 16.20 7.13 -4.68 7.55 6.46 -0.49
64: 21.34 4.56 -18.88 8.59 11.00 4.36 2.92 -1.92 2.68 1.26
65: -5.70 -9.88 4.50 -4.35 11.46 -0.37 -12.32 2.72 -1.47 -9.15
66: 2.62 10.30 1.14 -2.43 -1.99 -2.93 -14.89 -8.42 -16.48 4.13
67: -0.59 0.25 -4.49 14.21 3.58 -10.48 -6.24 5.91 -12.14 5.96
68: 22.13 1.38 4.84 7.22 -4.52 -2.85 5.80 3.56 13.79 4.41
TEST F16_F32_ALIGNED_S m=100 n=46 k=576 batch=2 split_k=4 matmul 0.904ms avg_err=9.06041
m = 1 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -9.86 5.64 1.88 1.94 5.24 -1.46 -11.65 -0.51 -5.33 -1.56
1: 6.70 11.93 0.17 1.50 -6.25 0.55 -7.23 0.33 1.02 -12.69
2: 4.78 -7.14 -0.11 -2.59 -1.92 8.85 12.74 0.36 8.37 5.94
3: 5.12 8.25 -2.94 -1.64 -3.30 -2.33 -11.30 -0.12 -1.79 8.53
4: -1.30 2.45 3.57 2.88 9.26 0.06 -4.61 1.10 3.90 -1.88
5: -1.18 12.13 -1.45 2.28 0.22 0.73 -9.41 -0.20 0.60 4.45
6: 5.46 -1.69 3.82 10.18 5.26 1.62 -1.57 -5.02 0.32 4.47
7: -0.28 2.97 5.72 -7.33 0.86 -6.89 0.06 -0.76 -3.94 4.99
8: -8.40 4.59 0.80 -4.08 5.07 -2.61 -7.04 -2.65 0.12 0.13
9: 1.11 0.65 0.06 14.04 -1.61 7.96 -2.27 1.39 8.03 8.29
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -9.86 5.64 1.88 1.94 5.24 -1.46 -11.65 -0.50 -5.33 -1.56
1: 5.12 11.13 -3.85 0.37 -8.46 4.56 -8.76 4.79 2.58 -9.85
2: 9.64 -10.81 -2.29 -2.09 -12.51 13.12 20.11 -3.02 5.31 8.38
3: 15.13 -0.50 -0.32 -13.24 0.97 0.51 -22.62 -6.05 1.47 8.64
4: 6.89 -0.10 7.42 8.47 6.57 0.89 0.37 -5.45 5.35 -1.89
5: -1.48 0.35 2.84 4.79 -7.07 6.88 -12.56 -1.17 -1.74 2.12
6: 3.50 -0.58 4.92 12.80 4.27 3.54 -8.47 -0.65 5.91 7.00
7: -6.54 0.45 -3.40 -5.35 -3.22 -8.81 -8.63 0.65 -6.49 0.78
8: 1.16 -7.98 2.16 -4.73 2.65 -6.58 -4.48 -6.14 0.99 0.32
9: 5.88 -15.35 -3.37 9.80 -4.87 24.00 -0.82 3.93 11.12 10.21
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -6.93 0.66 -3.78 0.89 3.79 -0.45 -3.35 3.25 0.46 -2.68
1: -1.41 3.03 0.01 0.13 -3.46 6.88 -7.08 -2.63 -5.31 -2.92
2: 0.67 -4.62 -1.37 -2.14 -1.24 4.30 3.63 6.09 4.14 4.97
3: -1.93 2.51 -2.31 1.83 4.52 -4.09 -0.61 4.11 4.62 1.20
4: 3.40 0.81 2.22 0.02 1.04 0.80 0.66 4.53 1.23 -4.49
5: -1.21 4.18 -0.39 0.63 0.76 3.61 -4.93 -2.85 -1.88 4.63
6: 4.25 -1.30 -1.41 5.23 6.20 -0.87 0.09 -1.43 -1.16 4.01
7: 2.73 -2.27 -0.02 -1.80 -1.31 -3.04 1.66 5.14 0.43 -1.17
8: 0.34 2.90 -1.42 1.48 1.64 0.82 -4.84 1.72 -2.62 -1.48
9: 2.07 -2.46 -0.01 3.97 -1.75 -0.46 -0.02 -1.33 0.25 0.55
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 3.24 0.63 5.47 4.50 1.32 -2.32 -4.29 2.14 -3.81 0.09
1: 5.44 4.34 0.71 1.17 1.84 -0.06 5.24 -1.23 6.81 -6.60
2: 2.84 1.69 2.81 1.21 -2.73 0.56 3.01 -7.87 3.06 1.88
3: 0.83 6.43 -1.72 -0.55 -0.32 -2.81 -5.31 -2.75 -3.40 6.07
4: 0.34 -2.51 3.71 1.19 6.51 -1.03 -2.15 -1.78 -1.62 -1.12
5: -0.71 -2.04 -1.22 -0.68 0.90 -6.31 -2.18 0.58 0.81 -1.13
6: 0.28 1.25 1.92 0.84 -0.08 1.70 -1.12 -3.41 2.45 -1.23
7: 0.97 0.99 0.16 -0.59 1.29 -1.79 0.25 1.57 -1.80 1.95
8: -2.36 3.36 0.12 1.98 2.46 2.87 -2.00 -3.22 1.26 3.65
9: -0.02 -0.12 1.41 6.32 0.91 3.31 0.15 2.39 2.60 7.31
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -2.65 4.09 -1.76 -0.32 0.41 0.97 -2.25 -0.75 -3.60 0.88
1: 1.28 4.10 0.24 0.49 -2.01 -6.36 -4.67 4.02 0.15 -1.93
2: 2.81 -1.47 -2.50 -2.89 1.54 2.97 3.78 3.38 0.31 -2.98
3: 4.31 -2.30 2.45 0.40 -2.92 2.83 -2.30 -0.46 0.01 -0.91
4: -0.46 0.28 -0.66 1.26 -0.44 0.88 -1.09 0.84 -1.10 2.41
5: 0.45 3.27 -2.10 1.98 -0.07 -0.43 -1.11 0.64 0.15 0.15
6: 0.27 -2.28 2.96 1.89 1.50 -0.29 -3.31 0.09 0.60 0.01
7: -4.23 -0.70 4.50 1.85 2.30 2.86 -2.57 -2.49 -0.54 1.90
8: -1.85 0.05 1.66 -2.91 -1.72 -4.49 2.58 -0.85 0.93 0.39
9: 1.33 2.53 -1.65 0.83 -3.33 2.96 -1.82 -2.31 3.80 -0.23
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -3.52 0.27 1.95 -3.13 -0.27 0.34 -1.76 -5.15 1.61 0.15
1: 1.40 0.47 -0.79 -0.29 -2.62 0.10 -0.72 0.16 -0.63 -1.24
2: -1.54 -2.74 0.96 1.23 0.52 1.02 2.32 -1.24 0.86 2.06
3: 1.91 1.61 -1.36 -3.31 -4.57 1.75 -3.07 -1.02 -3.02 2.17
4: -4.59 3.87 -1.69 0.41 2.15 -0.59 -2.04 -2.48 5.40 1.32
5: 0.28 6.72 2.26 0.34 -1.36 3.87 -1.19 1.43 1.53 0.79
6: 0.66 0.64 0.35 2.21 -2.35 1.08 2.77 -0.27 -1.56 1.68
7: 0.25 4.95 1.08 -6.79 -1.42 -4.91 0.72 -4.99 -2.03 2.31
8: -4.53 -1.71 0.44 -4.62 2.68 -1.81 -2.78 -0.30 0.54 -2.43
9: -2.28 0.69 0.31 2.93 2.56 2.15 -0.59 2.64 1.39 0.66
TEST F16_F32_ALIGNED_M m=100 n=46 k=576 batch=2 split_k=4 matmul 0.685ms avg_err=0.00235069
TEST F16_F32_L m=100 n=46 k=576 batch=2 split_k=4 matmul 0.629ms avg_err=12.8436
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -3.81 -2.47 2.89 3.86 10.48 2.97 -7.06 -7.27 -11.65 4.82
1: 8.41 11.03 1.87 -1.42 17.17 3.07 3.99 4.05 -0.15 -13.09
2: 12.03 -5.71 1.21 -1.77 -12.20 -0.23 9.75 1.45 6.43 1.37
3: -9.66 -2.80 4.35 -2.84 -3.69 3.07 -10.06 8.90 -7.20 -1.61
4: 7.63 -9.16 -2.53 11.29 1.65 3.91 2.01 -13.60 2.82 4.73
5: 1.00 -3.58 3.36 -16.38 -7.81 12.13 5.98 -2.12 3.48 6.59
6: 2.28 -19.44 3.13 5.95 -7.99 6.85 -0.94 4.07 10.51 2.08
7: -6.98 8.80 5.68 6.70 -1.00 -16.48 -9.21 0.94 7.87 -9.57
8: 11.58 10.22 -4.97 -2.91 4.11 -11.31 -16.80 -10.62 -7.52 -6.32
9: 5.04 14.55 -4.99 -4.78 1.69 -11.12 7.31 -11.71 -0.66 11.52
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -0.07 13.63 -1.60 -1.28 0.92 -3.01 -7.07 2.62 -3.68 1.00
1: 6.25 7.47 13.07 -2.08 6.91 0.36 -6.92 4.37 19.10 -4.92
2: -1.66 0.99 6.43 -3.23 1.31 -9.70 2.77 10.55 6.83 4.05
3: -10.31 -1.05 -0.13 5.48 -10.30 2.33 -8.27 5.38 -10.09 -0.32
4: 0.23 -1.79 2.78 -0.24 8.81 1.42 -8.60 -7.51 9.84 -2.68
5: 7.15 3.87 -1.59 -12.34 -1.83 17.61 -1.10 8.02 0.77 1.20
6: -2.45 -23.56 1.86 -0.87 -20.27 5.16 13.48 -4.12 3.64 2.06
7: -5.37 4.16 0.04 -13.16 -9.55 -19.43 -7.51 9.69 7.06 2.97
8: 7.39 7.11 1.65 -5.38 1.89 -3.62 -5.60 10.60 -12.97 -19.82
9: -3.23 4.18 11.72 -10.90 3.74 -16.78 -5.53 -3.23 -2.19 7.10
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -1.95 -1.19 0.85 -1.96 -3.09 -4.82 -1.18 1.74 -4.34 4.21
1: -0.16 3.03 -1.94 -0.33 4.78 1.44 1.92 2.00 3.95 5.21
2: 1.36 -3.57 1.06 -1.53 -0.88 -7.40 5.45 3.59 0.41 -0.62
3: -3.88 2.44 -0.84 -0.77 -1.37 1.00 -5.23 2.93 3.49 -2.39
4: 5.16 -1.59 4.75 -1.82 1.67 2.62 -0.54 -11.38 -1.88 2.51
5: 1.09 -1.09 1.13 -8.14 -9.17 9.32 1.38 1.88 1.83 3.15
6: -1.41 -5.30 -0.63 5.23 -7.62 3.71 0.79 2.71 -3.16 -4.96
7: -0.90 3.40 3.75 -2.15 -2.75 -6.13 -2.23 3.49 1.87 5.35
8: 1.74 1.96 -1.33 -3.88 -2.65 -2.85 -5.53 1.58 -5.55 -2.71
9: 6.26 1.13 -4.90 -2.26 -2.20 -7.96 3.25 -2.57 5.22 5.06
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -2.68 0.51 0.74 -1.43 5.85 3.61 -14.01 -3.83 -1.54 0.51
1: -1.64 2.13 5.11 -1.51 4.94 5.27 -0.71 1.64 7.10 -2.04
2: 2.23 -0.78 3.79 -4.09 -2.34 3.86 -1.39 1.89 0.55 4.08
3: -2.31 -8.93 3.23 3.95 -3.62 4.08 -1.23 2.36 -6.77 0.85
4: -0.92 -4.19 -3.77 0.72 -2.71 -3.69 -5.68 -3.59 2.41 4.28
5: 0.71 -4.59 1.95 -1.62 5.76 4.77 0.56 5.92 -2.73 -2.69
6: 0.52 -6.20 -0.72 -3.01 -1.22 0.52 4.00 0.86 -0.38 9.55
7: -0.23 0.09 -3.87 0.61 -0.88 -6.10 -1.22 0.07 -6.47 -0.95
8: 5.18 -0.46 3.79 -7.33 0.73 -1.35 -1.38 -0.38 -6.60 -7.39
9: -4.26 -4.31 4.49 2.13 5.67 -1.99 -1.70 -0.94 0.08 2.47
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -5.09 -0.45 4.48 4.56 5.41 5.89 1.47 -1.98 -6.98 2.00
1: 9.47 3.31 4.59 -4.03 3.25 2.82 -2.45 5.52 -10.44 -1.69
2: 3.58 0.08 -0.94 0.66 -5.27 4.01 4.95 -5.19 4.86 0.67
3: -5.07 1.40 -2.45 -9.42 -1.60 2.22 0.43 -2.14 1.46 1.04
4: 3.03 0.87 -3.03 7.59 0.65 2.10 0.97 2.63 4.91 0.94
5: 0.64 -1.24 0.29 1.47 1.20 -1.54 -0.79 -7.29 4.75 2.59
6: 2.94 -4.28 -0.34 -0.60 1.62 1.97 -0.39 -6.86 10.30 1.72
7: -6.60 0.88 6.82 6.62 5.12 -4.70 -4.04 -4.41 4.99 -11.36
8: 1.51 4.15 -2.25 5.67 6.64 -6.28 -7.78 -9.44 3.03 0.04
9: 4.54 3.48 -2.16 -0.20 1.72 0.30 5.56 -6.60 0.47 4.66
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 5.91 -1.34 -3.18 2.69 2.31 -1.72 6.66 -3.20 1.21 -1.90
1: 0.74 2.56 -5.90 4.45 4.19 -6.46 5.22 -5.12 -0.76 -14.57
2: 4.86 -1.44 -2.71 3.19 -3.72 -0.70 0.74 1.16 0.61 -2.75
3: 1.60 2.30 4.41 3.39 2.90 -4.24 -4.04 5.74 -5.37 -1.12
4: 0.36 -4.26 -0.48 4.81 2.04 2.87 7.25 -1.27 -2.62 -3.00
5: -1.44 3.34 -0.01 -8.09 -5.60 -0.42 4.83 -2.61 -0.38 3.54
6: 0.23 -3.66 4.82 4.32 -0.76 0.65 -5.34 7.36 3.75 -4.23
7: 0.75 4.42 -1.01 1.62 -2.50 0.45 -1.72 1.80 7.48 -2.61
8: 3.15 4.56 -5.18 2.63 -0.61 -0.83 -2.11 -2.38 1.60 3.75
9: -1.50 14.25 -2.41 -4.45 -3.51 -1.47 0.19 -1.61 -6.44 -0.67
TEST F16_F32_ALIGNED_S m=623 n=111 k=128 batch=2 split_k=1 matmul 0.983ms avg_err=4.26513
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -3.89 -1.31 -5.81 -4.83 -0.43 1.04 -3.96 2.95 -1.19 3.20
28: 4.19 -3.28 -5.57 5.31 -1.24 2.21 5.05 3.77 3.27 4.68
29: 0.06 -1.62 3.80 -6.21 1.99 -4.98 1.68 2.68 0.51 3.95
30: 2.77 -7.81 6.45 0.78 7.09 2.87 -5.36 2.28 -3.14 0.96
31: 2.23 4.80 -3.44 -1.47 -0.34 -4.08 0.68 5.58 -5.58 2.18
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -3.88 -1.31 -5.81 -4.83 -0.43 1.04 -3.96 2.95 -1.19 3.20
28: 4.19 -3.28 -5.57 5.31 -1.24 2.21 5.05 3.77 3.28 4.68
29: 0.06 -1.62 3.80 -6.21 1.99 -4.98 1.68 2.68 0.51 3.95
30: 2.77 -7.81 6.45 0.78 7.09 2.87 -5.36 2.28 -3.14 0.96
31: 2.23 4.80 -3.44 -1.47 -0.34 -4.08 0.68 5.58 -5.58 2.18
32: -4.72 5.36 1.85 -2.39 -0.62 2.79 0.02 -0.45 1.61 -0.22
33: 2.51 -1.42 1.98 5.53 -6.23 -6.46 0.10 7.03 3.77 -1.10
34: 0.45 -4.02 1.70 4.36 -3.79 -2.20 -0.23 -2.85 2.93 7.15
35: 1.98 0.73 0.79 -0.72 -3.05 -5.80 2.28 4.15 -4.14 -0.34
36: -3.17 0.48 -2.64 2.22 -1.13 3.65 -3.05 -1.37 -3.71 1.42
TEST F16_F32_ALIGNED_M m=623 n=111 k=128 batch=2 split_k=1 matmul 1.019ms avg_err=0.00110578
TEST F16_F32_ALIGNED_L m=623 n=111 k=128 batch=2 split_k=1 matmul 0.555ms avg_err=4.25169
m = 320 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
315: -0.84 2.12 2.49 -1.15 7.78 0.48 5.77 -0.54 1.56 3.86
316: -0.37 -1.74 -5.70 -0.34 -2.96 3.91 -6.45 2.34 -10.26 -1.58
317: -1.30 1.87 -6.15 -0.23 3.58 2.33 6.25 1.55 0.37 5.83
318: 2.27 0.93 -6.86 1.00 -0.86 -0.08 1.65 -5.61 -4.50 2.57
319: -2.66 3.88 -4.02 -2.42 2.19 4.02 -4.86 -0.11 3.38 4.81
320: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
321: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
322: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
323: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
324: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
315: -0.84 2.12 2.49 -1.15 7.78 0.48 5.77 -0.54 1.56 3.86
316: -0.37 -1.74 -5.70 -0.34 -2.96 3.91 -6.45 2.34 -10.26 -1.58
317: -1.30 1.87 -6.15 -0.23 3.58 2.33 6.25 1.55 0.37 5.83
318: 2.27 0.94 -6.86 1.00 -0.86 -0.08 1.65 -5.61 -4.50 2.57
319: -2.66 3.88 -4.02 -2.42 2.19 4.02 -4.86 -0.11 3.38 4.81
320: -1.76 7.11 1.02 4.56 -2.44 7.21 2.26 7.61 -6.60 0.03
321: 5.11 -3.21 2.86 4.20 0.22 -1.97 1.55 -2.94 -10.70 -1.02
322: 1.48 1.59 -1.36 -2.01 0.85 0.09 3.06 -0.01 0.10 -4.05
323: -2.87 2.20 -2.53 3.26 -7.28 5.09 0.85 -6.39 -3.38 0.17
324: 2.60 -7.60 -3.14 -2.47 -1.77 1.24 -1.37 -2.20 2.95 2.01
TEST F16_F32_ALIGNED_S m=623 n=111 k=128 batch=2 split_k=4 matmul 1.232ms avg_err=4.25071
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 2.20 -1.70 -6.35 -7.40 -0.43 0.86 -0.11 0.15 -2.98 3.48
28: -0.78 3.22 3.01 0.69 -2.01 -5.74 3.71 -5.10 0.86 1.82
29: -3.85 5.66 3.44 -2.34 -3.85 -6.73 -2.90 -1.91 -3.93 3.16
30: -3.55 1.70 -2.66 6.09 -1.84 2.96 0.74 8.67 2.76 -2.92
31: 2.76 -4.77 0.13 -0.71 4.76 -0.75 -0.37 -2.36 5.07 -3.22
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 2.20 -1.70 -6.35 -7.40 -0.43 0.86 -0.12 0.15 -2.98 3.48
28: -0.79 3.22 3.01 0.69 -2.01 -5.74 3.71 -5.10 0.86 1.82
29: -3.85 5.66 3.44 -2.34 -3.85 -6.73 -2.90 -1.91 -3.93 3.16
30: -3.55 1.70 -2.66 6.09 -1.84 2.96 0.74 8.67 2.76 -2.92
31: 2.76 -4.77 0.13 -0.71 4.76 -0.75 -0.37 -2.36 5.07 -3.22
32: 5.44 -5.70 -0.23 5.34 0.04 -3.21 0.62 0.32 -2.73 1.35
33: -6.10 9.26 2.80 -0.70 0.53 3.72 4.12 0.06 -5.27 4.47
34: 3.54 -2.22 2.93 -3.34 4.18 6.06 -0.51 6.06 8.26 -1.47
35: -0.72 3.52 3.13 10.13 7.26 -1.63 -0.11 0.37 1.99 -7.70
36: 3.21 6.13 -5.71 1.78 -6.09 1.77 3.44 -2.15 5.75 3.44
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 0.04 -2.38 -1.49 0.38 -1.59 0.28 1.57 0.22 -1.44 0.05
28: -1.68 0.01 1.46 0.00 -0.08 -2.89 1.13 -0.56 -0.70 0.69
29: -1.20 1.62 1.64 1.01 1.02 -1.34 -3.30 -1.31 0.58 0.48
30: -0.55 -0.28 1.02 -0.46 -1.12 1.31 -1.12 -1.85 2.14 2.11
31: -1.83 0.55 0.07 -3.33 -2.20 1.35 -0.09 0.04 2.00 -0.26
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -0.67 1.14 -3.49 -5.18 -0.45 2.01 1.70 -1.54 -0.33 2.06
28: -0.83 2.47 0.55 -2.55 -1.36 -0.52 -0.76 -2.28 -0.75 4.53
29: 2.50 0.70 1.91 -0.55 -2.22 -1.83 2.27 -0.60 -4.30 2.08
30: 1.07 -2.59 -2.55 3.10 -0.65 -0.89 0.21 4.03 0.55 -1.32
31: 0.52 -2.32 1.24 4.89 1.02 -4.26 -1.77 -1.13 0.17 -0.80
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 1.32 2.37 -0.83 -2.62 -0.65 -0.41 -3.25 1.91 -0.33 -0.77
28: 1.32 -1.68 -0.28 3.66 -0.12 -1.23 2.72 -1.42 3.03 -3.62
29: -1.25 2.04 0.02 -2.83 -0.82 -2.20 -0.72 -0.52 -1.18 -1.04
30: 0.11 3.13 -1.28 1.93 -1.11 3.44 0.53 5.53 -1.16 -1.68
31: 1.40 -1.01 1.85 -2.30 1.79 1.16 1.94 -1.30 1.77 -0.07
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 1.52 -2.82 -0.53 0.01 2.26 -1.02 -0.13 -0.44 -0.88 2.14
28: 0.40 2.43 1.28 -0.43 -0.45 -1.11 0.62 -0.84 -0.72 0.23
29: -3.91 1.30 -0.13 0.04 -1.82 -1.36 -1.15 0.51 0.97 1.64
30: -4.18 1.44 0.14 1.51 1.04 -0.90 1.12 0.96 1.23 -2.02
31: 2.67 -2.00 -3.02 0.03 4.15 0.99 -0.45 0.02 1.13 -2.09
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_ALIGNED_M m=623 n=111 k=128 batch=2 split_k=4 matmul 2.039ms avg_err=0.00111243
TEST F16_F32_ALIGNED_L m=623 n=111 k=128 batch=2 split_k=4 matmul 0.444ms avg_err=7.11053
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -4.28 1.18 -5.84 1.04 2.30 5.40 2.23 2.14 -2.15 2.08
1: -1.25 2.69 -1.16 3.83 1.73 -3.44 -9.23 -6.40 2.28 -0.87
2: 1.13 -3.91 0.05 0.57 9.21 1.39 2.69 1.70 -0.42 -0.43
3: 3.85 -1.97 5.85 0.22 5.40 0.51 -1.26 -3.12 -6.95 5.70
4: 5.78 6.59 0.32 -4.07 -3.83 1.06 -0.31 -2.36 -1.81 -1.13
5: 4.61 6.82 -3.74 -4.70 -0.03 -3.29 -1.50 3.39 -3.46 -0.98
6: -5.01 1.08 -2.42 3.83 2.37 0.07 -5.17 7.87 -3.71 1.87
7: -1.49 -6.02 5.29 0.42 -1.52 -0.55 0.61 4.96 -4.08 -4.35
8: -1.15 1.97 -7.31 -2.05 3.32 -9.68 1.04 -1.19 -3.13 4.76
9: -9.89 -2.85 2.47 1.78 0.69 0.47 -2.67 -0.97 1.85 4.88
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -2.92 -0.86 -4.84 0.27 2.63 4.01 1.50 3.63 3.69 0.79
1: -6.22 -2.34 -0.51 1.98 0.51 -0.12 -2.53 -0.54 -2.07 0.36
2: 1.77 -2.72 -0.17 0.67 7.38 -1.33 -1.76 2.16 2.10 -1.23
3: 7.59 -6.31 1.96 2.36 2.50 -0.98 -1.22 2.31 -3.21 3.54
4: 4.06 4.08 -2.58 -0.85 5.92 -0.36 -0.05 -1.00 2.16 -2.83
5: 3.43 2.36 -6.78 1.26 1.04 -3.15 0.40 2.23 3.78 0.78
6: 3.55 0.13 -8.26 4.61 2.80 1.30 -3.41 1.31 5.18 -0.47
7: -0.73 -5.73 2.40 3.28 2.12 1.10 -0.12 -0.81 1.33 -2.37
8: -0.09 2.48 -7.30 0.29 7.95 -12.80 -2.42 -0.18 5.96 5.09
9: -7.04 -0.17 -3.95 -0.86 2.20 -3.05 -1.78 3.52 1.58 4.43
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 2.49 0.05 -1.79 -1.35 1.28 0.69 -0.35 0.58 -0.34 1.64
1: -0.03 -1.34 -0.59 2.00 1.10 0.29 -2.16 -3.12 1.47 0.42
2: -0.28 -2.36 0.44 -0.41 2.34 1.41 -3.01 -0.59 -0.22 -3.29
3: 5.40 -1.36 1.41 1.11 0.91 2.47 0.91 0.30 -4.13 0.74
4: -0.28 0.79 0.09 0.53 0.50 0.52 0.66 -0.99 -0.32 -1.26
5: 2.75 -0.36 -3.58 -0.56 0.37 -2.06 -0.38 -0.06 -1.14 1.07
6: -0.59 0.43 -1.89 -2.59 1.26 -0.16 -3.14 1.31 -0.10 -3.41
7: -1.56 -3.67 0.19 0.84 0.49 -1.16 -2.73 0.29 0.28 -2.43
8: 0.66 0.20 -0.43 -0.19 0.48 -1.45 1.46 -1.20 -0.17 3.88
9: -1.26 0.99 0.64 1.07 -0.14 -3.27 2.58 1.21 1.45 -1.06
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -5.58 0.69 -1.94 1.51 -2.80 1.51 1.15 1.30 0.74 -0.94
1: -3.58 -0.58 -2.92 1.98 1.28 -2.50 -2.87 0.43 0.81 -1.05
2: 0.86 0.28 0.23 -0.53 3.03 -2.77 1.78 0.90 0.04 2.32
3: -0.20 1.07 0.97 -1.33 3.50 -1.69 -3.10 0.06 -1.40 3.30
4: 2.53 3.92 1.25 -1.64 1.33 1.34 0.25 4.14 -1.84 2.78
5: -3.58 2.94 -0.03 -2.68 1.03 -1.06 1.13 2.05 -0.59 1.73
6: 1.95 0.24 0.94 1.72 -0.16 2.10 -1.21 1.28 -0.24 2.29
7: 2.53 -1.65 4.37 3.29 -0.08 3.58 0.32 1.26 -0.95 -0.76
8: -0.60 -1.28 -5.09 -0.15 3.85 -6.23 -2.54 1.03 2.78 1.28
9: -4.16 -1.65 -0.27 0.44 1.37 1.43 0.10 0.86 1.79 2.64
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -2.37 0.81 0.51 -1.13 4.01 2.50 2.15 1.96 -3.52 1.28
1: 0.19 2.71 4.41 0.77 -1.37 -1.73 0.01 -1.82 2.79 -0.93
2: -1.19 0.41 -2.69 2.02 3.16 0.08 -0.17 -0.84 -3.63 -0.92
3: -0.06 1.32 0.53 -2.62 0.90 2.56 -1.12 -3.10 -0.53 -0.80
4: 2.58 3.83 -2.47 -1.73 -3.23 -1.25 -0.85 -3.63 -1.46 -1.78
5: 3.73 4.43 2.33 0.49 0.09 0.19 -1.61 1.11 -2.10 -3.48
6: -5.36 0.77 -1.60 1.93 0.08 -0.49 -1.21 4.49 0.75 0.57
7: 0.66 1.50 -0.80 -1.03 -2.61 -1.10 1.17 2.70 0.73 -1.50
8: -0.40 2.95 0.03 -0.63 -1.39 -0.21 1.90 -1.82 -2.23 0.01
9: -1.73 -1.40 -0.14 1.99 2.24 2.67 -4.07 1.20 0.16 1.51
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 1.18 -0.37 -2.62 2.01 -0.19 0.70 -0.72 -1.70 0.97 0.10
1: 2.17 1.90 -2.06 -0.92 0.72 0.51 -4.21 -1.89 -2.79 0.69
2: 1.73 -2.25 2.07 -0.51 0.68 2.68 4.10 2.23 3.39 1.46
3: -1.29 -3.01 2.93 3.05 0.09 -2.82 2.04 -0.38 -0.89 2.46
4: 0.95 -1.95 1.46 -1.23 -2.44 0.46 -0.37 -1.87 1.81 -0.87
5: 1.71 -0.19 -2.46 -1.96 -1.52 -0.37 -0.64 0.29 0.36 -0.30
6: -1.01 -0.36 0.13 2.78 1.20 -1.39 0.40 0.79 -4.14 2.42
7: -3.13 -2.20 1.53 -2.68 0.67 -1.88 1.84 0.71 -4.14 0.34
8: -0.81 0.11 -1.82 -1.09 0.39 -1.80 0.22 0.80 -3.52 -0.41
9: -2.75 -0.79 2.24 -1.71 -2.78 -0.36 -1.27 -4.25 -1.55 1.80
TEST F16_F32_S m=100 n=46 k=558 batch=2 split_k=1 matmul 2.422ms avg_err=6.95513
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 7.40 -0.93 -8.29 10.35 7.29 17.67 2.80 0.96 -7.37 11.22
28: -9.55 10.36 15.29 -6.91 -8.63 -9.62 2.93 10.00 3.53 5.59
29: -15.18 -5.24 -4.08 -1.58 -0.74 -0.21 0.71 3.54 -12.93 -0.47
30: 4.44 2.49 -1.89 -0.13 14.39 -4.98 10.24 0.72 6.66 -0.78
31: 4.35 -6.28 4.53 -1.90 -2.88 3.80 6.82 -5.14 -20.85 0.96
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 7.40 -0.93 -8.29 10.35 7.30 17.67 2.79 0.97 -7.37 11.22
28: -9.55 10.36 15.29 -6.91 -8.63 -9.62 2.93 10.00 3.53 5.59
29: -15.18 -5.24 -4.08 -1.58 -0.74 -0.21 0.71 3.54 -12.93 -0.47
30: 4.44 2.49 -1.89 -0.13 14.39 -4.98 10.24 0.72 6.66 -0.78
31: 4.35 -6.28 4.53 -1.90 -2.88 3.79 6.82 -5.14 -20.85 0.96
32: 3.35 9.95 -4.21 -2.07 -6.60 3.70 0.89 2.22 7.91 -2.48
33: 3.56 4.09 6.55 6.28 0.37 -5.10 -13.78 6.93 1.59 4.58
34: 1.55 -5.27 8.44 9.58 4.35 1.86 -12.18 -0.55 9.33 -4.49
35: 21.04 1.96 10.02 2.61 4.52 5.56 14.61 0.96 1.40 5.09
36: -0.71 5.51 6.01 -3.51 -2.54 7.39 9.10 -17.21 3.63 -4.45
TEST F16_F32_M m=100 n=46 k=558 batch=2 split_k=1 matmul 1.401ms avg_err=0.00230593
TEST F16_F32_L m=100 n=46 k=558 batch=2 split_k=1 matmul 1.498ms avg_err=4.37083
m = 64 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
59: -5.81 -7.47 5.99 4.59 5.66 -13.64 -3.31 12.98 2.48 -6.55
60: -18.83 -4.05 -9.31 -4.45 -4.67 -6.46 -3.17 4.88 -4.29 -2.31
61: -5.93 -1.47 -8.19 3.42 3.32 16.60 -2.02 4.34 2.93 -5.60
62: -5.47 -1.39 6.68 -8.46 4.61 -2.57 -11.21 9.68 7.96 -2.50
63: 15.48 3.68 2.38 -16.56 10.12 7.51 3.01 3.36 8.32 12.11
64: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
65: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
66: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
67: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
68: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
59: -5.81 -7.47 5.99 4.59 5.67 -13.65 -3.31 12.98 2.48 -6.55
60: -18.83 -4.05 -9.31 -4.46 -4.67 -6.46 -3.17 4.88 -4.29 -2.31
61: -5.93 -1.46 -8.19 3.42 3.32 16.60 -2.02 4.34 2.93 -5.60
62: -5.48 -1.39 6.68 -8.45 4.61 -2.57 -11.20 9.68 7.96 -2.50
63: 15.48 3.68 2.38 -16.56 10.12 7.51 3.01 3.36 8.32 12.11
64: -4.88 -14.91 -9.18 4.69 2.05 17.86 -3.95 5.21 -3.50 0.65
65: 4.52 1.63 -3.02 9.91 10.73 -7.84 -14.97 -9.08 -5.92 0.83
66: 7.75 -9.05 -3.68 -11.00 -4.10 -7.03 -7.14 -3.88 6.32 6.91
67: -0.01 -10.78 4.60 11.46 -1.85 -11.35 0.84 2.99 -11.01 3.70
68: 1.99 5.95 -4.22 -0.31 -0.52 7.74 8.74 -7.45 -8.60 -0.76
TEST F16_F32_S m=100 n=46 k=558 batch=2 split_k=4 matmul 0.88ms avg_err=7.75108
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 10.42 2.56 2.00 7.39 -12.68 -1.88 -6.62 -9.36 -0.79 1.21
28: 4.29 3.48 -7.23 -8.88 -16.92 6.27 7.31 -5.29 -0.68 -9.71
29: -0.27 -0.57 1.72 -7.94 -6.06 -1.49 -0.88 -3.37 -15.93 -4.07
30: 5.98 -6.78 -3.63 8.85 -1.86 4.18 5.43 5.34 12.94 -2.16
31: -10.27 -14.28 -3.51 -8.07 8.92 -5.14 9.15 -5.71 8.19 -7.30
32: 5.08 2.30 1.31 5.24 0.14 -4.17 2.81 4.07 2.39 1.15
33: -1.73 2.76 1.45 3.56 -3.21 -0.87 4.00 -1.15 0.28 -1.14
34: -5.04 8.30 -4.67 -0.06 -0.10 1.98 0.50 2.78 -4.07 -1.59
35: 4.18 -2.29 -0.87 -3.16 -2.27 -3.26 3.99 1.68 -2.61 -8.06
36: -2.49 4.11 3.89 5.02 4.35 -1.04 -2.35 -2.40 0.48 -2.37
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 10.42 2.56 2.00 7.39 -12.68 -1.88 -6.63 -9.36 -0.79 1.21
28: 4.29 3.48 -7.23 -8.88 -16.92 6.27 7.30 -5.29 -0.68 -9.71
29: -0.27 -0.56 1.72 -7.94 -6.06 -1.49 -0.88 -3.37 -15.93 -4.07
30: 5.98 -6.78 -3.63 8.85 -1.86 4.18 5.43 5.34 12.94 -2.16
31: -10.27 -14.28 -3.51 -8.07 8.92 -5.14 9.15 -5.70 8.19 -7.30
32: 3.41 -6.33 6.20 4.84 -3.23 -2.53 -5.47 -7.82 3.61 -0.98
33: 7.22 9.81 6.22 -3.70 -11.81 4.65 -2.27 5.11 -0.82 2.25
34: 5.89 1.47 -0.37 -4.96 0.18 -8.25 -0.87 -20.63 -12.09 8.28
35: 16.31 6.02 1.49 -12.03 4.83 -0.60 11.90 6.07 0.31 -8.14
36: -0.29 -10.09 -0.96 -4.75 0.75 6.00 16.17 4.99 8.28 11.25
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 7.35 -9.13 -4.56 3.47 0.91 2.91 2.61 -0.78 0.00 2.21
28: 1.36 -0.41 -0.28 -2.60 -2.64 5.96 -0.29 0.64 2.04 -5.15
29: -0.36 -0.23 -0.00 -1.40 -0.46 -4.23 4.26 -1.58 3.35 -4.27
30: -0.11 -6.20 -2.97 3.53 -0.89 -0.18 -2.28 -2.06 2.86 -0.88
31: -2.35 -5.97 -1.45 -4.05 -0.87 -0.13 1.27 1.10 3.13 -1.99
32: 3.94 -0.06 2.51 1.25 0.20 -2.92 0.99 1.11 2.89 -1.09
33: 3.69 1.87 -1.20 -0.11 -0.82 -1.51 1.28 -0.66 1.85 1.00
34: -4.04 1.44 -1.51 1.30 -1.07 0.56 1.17 1.31 -1.50 -0.07
35: 3.44 -0.94 0.18 -0.37 1.59 -2.14 -0.84 1.16 1.73 -4.06
36: -1.28 0.86 3.83 0.20 -0.18 1.03 -3.60 -1.15 0.53 -0.98
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -0.08 7.25 1.72 -2.91 -4.21 -4.31 -1.93 -3.30 -1.41 1.35
28: 2.01 4.11 -7.08 -5.13 -3.64 1.56 5.69 -6.00 -1.10 -2.69
29: 1.94 -4.86 -1.01 6.57 0.14 2.05 3.23 0.57 -5.34 -3.96
30: 5.43 -1.67 0.21 1.98 -3.01 -2.34 1.77 3.63 2.84 0.87
31: -3.28 -5.58 -7.96 -3.62 2.21 -1.76 4.27 -1.85 2.33 -0.90
32: -3.29 3.52 -1.26 1.57 -0.00 1.48 2.33 1.38 0.54 3.23
33: -2.62 -0.64 2.59 1.25 -0.24 1.56 1.23 0.06 0.01 -2.34
34: -0.22 3.09 0.07 2.63 1.17 1.58 -0.12 1.14 -2.23 -2.65
35: -2.15 0.21 2.04 -1.96 -0.22 1.10 0.55 3.06 1.23 -1.73
36: 2.55 2.53 -1.16 0.17 3.09 0.89 1.84 -3.55 -1.21 -1.72
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 4.53 5.17 -0.21 5.77 -3.92 -2.80 -1.23 -2.91 1.82 -3.34
28: 9.06 -3.14 -0.04 3.27 -11.64 -3.24 -2.80 6.95 2.09 4.58
29: -2.35 -4.87 0.17 -15.38 -5.42 2.46 -3.89 -3.82 -3.75 -1.58
30: -1.84 -3.84 1.82 9.68 -1.19 1.33 -0.01 6.61 5.75 -3.00
31: -5.93 -3.43 5.51 1.60 1.66 -5.63 0.28 -4.97 -0.96 -0.14
32: 0.16 -2.90 -0.94 1.46 -1.51 -1.55 -1.23 2.37 -2.45 -0.37
33: -0.95 1.40 -0.98 1.65 -1.20 -0.92 0.87 0.08 0.64 4.47
34: 0.95 2.89 -1.26 -2.20 0.11 0.68 -2.41 1.35 -2.22 -0.72
35: 0.49 -1.78 -1.75 -1.67 -2.51 -1.25 1.57 -2.65 -2.12 -0.73
36: -0.30 -1.03 -0.82 1.71 -1.60 -3.48 0.02 0.15 -1.32 -0.44
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -1.38 -0.73 5.05 1.07 -5.46 2.33 -6.07 -2.36 -1.20 0.99
28: -8.14 2.92 0.17 -4.41 1.01 1.99 4.70 -6.88 -3.70 -6.45
29: 0.51 9.40 2.56 2.27 -0.31 -1.77 -4.49 1.45 -10.19 5.75
30: 2.51 4.92 -2.69 -6.34 3.23 5.37 5.94 -2.83 1.49 0.85
31: 1.29 0.69 0.39 -2.00 5.93 2.37 3.33 0.01 3.69 -4.26
32: 4.27 1.75 1.00 0.97 1.45 -1.18 0.72 -0.78 1.42 -0.62
33: -1.85 0.13 1.04 0.77 -0.96 -0.01 0.63 -0.62 -2.22 -4.27
34: -1.73 0.89 -1.96 -1.80 -0.31 -0.85 1.86 -1.01 1.88 1.85
35: 2.40 0.22 -1.35 0.84 -1.13 -0.97 2.71 0.12 -3.45 -1.54
36: -3.46 1.73 2.05 2.93 3.04 0.51 -0.61 2.16 2.48 0.77
TEST F16_F32_M m=100 n=46 k=558 batch=2 split_k=4 matmul 0.772ms avg_err=0.00230715
TEST F16_F32_L m=100 n=46 k=558 batch=2 split_k=4 matmul 0.622ms avg_err=12.4039
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 0.08 -10.27 -5.87 0.86 -7.25 0.59 -4.86 2.47 -9.31 -7.34
1: 3.86 5.50 -1.92 -7.35 -0.07 12.21 18.33 -6.08 -9.76 5.89
2: -0.48 -3.21 -9.73 7.01 0.45 0.19 -3.31 -3.82 -1.30 -1.38
3: 15.09 0.90 -1.48 -1.50 -8.18 0.05 9.18 2.13 2.55 3.53
4: -5.29 13.00 -3.72 -21.15 -3.69 5.18 7.23 4.41 -2.96 8.59
5: -3.77 12.32 10.25 -11.68 -9.07 -4.08 -8.82 14.22 2.65 13.73
6: 0.12 12.51 -4.02 -3.77 6.75 -2.78 12.30 -8.02 7.95 -11.16
7: -1.08 6.76 -8.66 0.64 -1.69 3.48 3.25 -4.24 15.85 7.06
8: -1.55 5.26 -2.74 -0.84 -5.12 -5.10 -14.89 -10.08 6.78 2.23
9: -4.06 13.22 7.12 4.14 15.14 2.63 -11.02 8.92 12.17 -1.94
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -5.42 -12.76 -1.11 2.06 -1.93 5.56 -11.34 8.22 -7.95 -11.69
1: 14.35 12.45 0.59 -5.14 -9.61 -3.05 3.76 -7.56 -16.23 -4.24
2: -6.44 -3.46 -12.66 7.78 5.57 4.75 3.81 -0.76 -2.44 -0.05
3: 0.52 -7.54 -4.69 -7.85 -5.63 6.75 16.85 -5.65 1.27 -4.33
4: -3.14 9.52 3.83 -9.28 5.88 2.84 1.05 5.62 -6.40 7.98
5: 8.09 3.36 9.09 -8.79 1.64 1.69 -4.24 7.08 5.10 -2.37
6: -11.98 7.77 0.52 2.90 8.53 -16.45 2.66 7.50 2.46 -11.19
7: -8.77 -3.93 -14.58 1.84 0.30 13.62 -7.93 -2.73 -1.70 4.44
8: 1.27 -2.58 -4.14 13.67 3.20 -2.98 -5.56 3.40 -10.39 6.91
9: 6.53 10.01 4.83 8.51 9.68 -2.87 -10.89 5.15 1.70 5.37
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -0.49 0.38 -0.03 -1.55 -3.39 -3.17 -2.47 2.10 -3.42 -3.04
1: 1.16 6.24 -4.09 -2.48 0.45 0.33 9.38 -3.86 -2.78 -2.38
2: -3.30 3.72 -5.06 2.29 -5.64 -3.47 3.14 -0.22 -1.34 -2.39
3: 3.31 -3.51 0.57 -1.05 -1.36 4.20 0.12 0.93 3.44 2.12
4: -2.43 5.51 3.90 -7.99 0.64 0.64 1.94 1.72 -0.20 -1.35
5: 4.88 1.43 4.01 -2.12 4.22 1.03 -3.11 2.96 -4.25 2.30
6: -1.10 3.83 1.04 3.16 4.22 1.40 6.78 2.06 4.10 -2.45
7: 0.01 -0.22 -8.21 1.87 0.97 -1.00 -0.94 -0.76 -6.06 3.16
8: -3.15 -1.56 -4.95 1.74 -1.38 -4.52 -7.01 3.01 -4.37 6.23
9: 1.35 1.07 3.12 4.56 9.45 2.49 -5.46 8.46 1.03 -1.75
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 1.52 -7.76 -4.32 2.54 -0.46 6.95 0.03 0.52 -2.78 -3.40
1: 3.42 -0.56 2.10 -4.42 -2.63 2.93 0.76 4.49 -7.62 1.64
2: -1.22 -5.54 0.82 5.38 4.17 3.41 -7.95 4.89 0.85 3.20
3: 1.51 -2.66 -1.45 0.52 -5.15 0.15 1.48 0.06 -4.88 0.87
4: -0.56 5.75 -5.18 -9.50 -2.36 -2.32 0.82 5.17 -4.29 7.06
5: -2.95 -0.35 0.14 -7.55 -3.30 0.48 -2.65 10.20 6.58 -0.08
6: 1.70 2.37 2.18 1.24 4.08 -6.71 -2.72 1.34 0.86 -0.37
7: -1.13 -1.27 -4.91 -2.41 -1.28 4.97 -2.41 1.55 7.84 1.28
8: -2.01 1.27 4.37 -0.06 -2.96 1.21 2.22 -2.13 4.50 -2.08
9: -0.85 6.04 -1.78 -0.33 3.25 -5.92 -2.89 -1.11 -0.20 2.67
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 0.85 -2.77 4.97 2.09 -4.95 -5.07 -2.58 -1.89 -6.37 -0.80
1: -1.44 2.94 6.94 -3.50 2.64 7.30 0.66 -4.39 3.59 3.00
2: 5.33 0.10 0.86 -4.16 4.90 0.24 0.35 -11.51 -0.35 5.63
3: 4.83 1.66 3.07 -2.31 -1.01 -1.60 5.83 0.51 4.95 6.34
4: 1.26 -6.27 0.56 -3.93 1.53 1.68 -1.05 6.73 -3.37 -2.09
5: -4.53 3.59 3.95 -3.56 -5.88 -5.44 3.11 -0.14 0.42 4.85
6: -0.53 -3.45 -2.54 -3.63 1.09 -2.16 0.23 -4.04 2.08 -1.19
7: -1.11 8.86 2.82 -0.79 -0.12 -3.33 5.72 -5.15 11.82 1.81
8: -2.07 4.84 0.04 2.54 1.97 -2.24 -0.69 -4.53 2.71 1.46
9: -4.53 1.02 0.68 -2.65 2.82 3.25 -1.36 -2.10 2.15 -1.64
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -1.81 -0.12 -6.49 -2.21 1.55 1.87 0.15 1.74 3.26 -0.10
1: 0.72 -3.12 -6.87 3.05 -0.54 1.64 7.53 -2.32 -2.94 3.62
2: -1.30 -1.49 -6.35 3.50 -2.97 0.00 1.16 3.02 -0.46 -7.82
3: 5.44 5.40 -3.67 1.34 -0.66 -2.71 1.75 0.63 -0.96 -5.80
4: -3.56 8.01 -3.01 0.27 -3.50 5.18 5.52 -9.21 4.89 4.98
5: -1.16 7.65 2.15 1.55 -4.11 -0.15 -6.17 1.20 -0.09 6.66
6: 0.04 9.76 -4.70 -4.54 -2.63 4.69 8.01 -7.38 0.92 -7.15
7: 1.15 -0.61 1.65 1.97 -1.27 2.85 0.88 0.12 2.24 0.81
8: 5.68 0.71 -2.21 -5.06 -2.75 0.46 -9.41 -6.43 3.96 -3.37
9: -0.03 5.09 5.11 2.55 -0.38 2.81 -1.30 3.68 9.19 -1.22
TEST F16_F32_ALIGNED_S m=512 n=1 k=256 batch=2 split_k=1 matmul 0.664ms avg_err=4.05371
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 5.60
28: 8.24
29: 3.48
30: 2.52
31: 10.69
32: 0.00
33: 0.00
34: 0.00
35: 0.00
36: 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 5.60
28: 8.24
29: 3.48
30: 2.52
31: 10.70
32: -2.54
33: -4.50
34: -7.27
35: -7.69
36: -7.96
TEST F16_F32_ALIGNED_M m=512 n=1 k=256 batch=2 split_k=1 matmul 1.032ms avg_err=0.00455338
TEST F16_F32_ALIGNED_L m=512 n=1 k=256 batch=2 split_k=1 matmul 0.832ms avg_err=4.31715
m = 256 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
251: -4.98
252: 4.18
253: 7.17
254: -7.30
255: 2.17
256: 0.00
257: 0.00
258: 0.00
259: 0.00
260: 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
251: -4.98
252: 4.18
253: 7.16
254: -7.31
255: 2.17
256: -6.56
257: -8.87
258: 1.68
259: 1.66
260: -6.50
TEST F16_F32_ALIGNED_S m=512 n=1 k=256 batch=2 split_k=4 matmul 0.401ms avg_err=7.36144
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -8.39
28: 4.60
29: -2.80
30: 1.30
31: -0.82
32: 5.01
33: 4.29
34: -1.10
35: 6.29
36: -12.27
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -8.39
28: 4.60
29: -2.81
30: 1.31
31: -0.82
32: -3.07
33: 5.10
34: 5.95
35: 4.57
36: 6.72
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -5.43
28: 2.92
29: -1.61
30: -0.73
31: -1.36
32: 4.88
33: 2.13
34: 2.15
35: -2.11
36: -0.06
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -1.46
28: 1.30
29: -0.23
30: -0.91
31: -1.23
32: 1.88
33: 0.15
34: -0.93
35: 5.70
36: -0.39
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -1.93
28: -0.27
29: -2.07
30: -0.88
31: 1.94
32: -6.74
33: -4.82
34: -2.82
35: 4.69
36: -3.23
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 0.43
28: 0.64
29: 1.11
30: 3.83
31: -0.16
32: 4.99
33: 6.84
34: 0.51
35: -1.99
36: -8.59
TEST F16_F32_ALIGNED_M m=512 n=1 k=256 batch=2 split_k=4 matmul 0.846ms avg_err=0.00460226
TEST F16_F32_ALIGNED_L m=512 n=1 k=256 batch=2 split_k=4 matmul 0.54ms avg_err=8.4641
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -2.93
1: 1.69
2: 3.05
3: -4.39
4: 1.52
5: -6.75
6: -4.41
7: -0.06
8: 7.48
9: -5.00
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 0.80
1: -10.40
2: 2.75
3: 4.27
4: -3.20
5: -3.23
6: 2.76
7: -6.91
8: 12.83
9: 4.48
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 2.55
1: -1.80
2: 2.94
3: 0.36
4: 2.35
5: -1.97
6: 3.65
7: -1.81
8: 6.98
9: 2.09
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -3.43
1: 0.23
2: -0.43
3: 1.61
4: 0.05
5: -3.03
6: -2.83
7: -0.27
8: 0.59
9: 0.71
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -3.28
1: -0.29
2: -0.91
3: -5.30
4: 1.72
5: -1.09
6: -2.03
7: 0.04
8: -1.74
9: -2.45
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 1.24
1: 3.54
2: 1.44
3: -1.05
4: -2.60
5: -0.66
6: -3.20
7: 1.99
8: 1.65
9: -5.36
TEST F16_F32_S m=128 n=110 k=622 batch=2 split_k=1 matmul 2.843ms avg_err=9.37018
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 6.37 5.56 15.29 -2.27 -12.70 10.38 -12.16 -1.75 7.77 20.42
28: 12.27 -3.05 6.53 -2.49 -1.81 8.50 -6.89 -0.94 5.53 -2.64
29: 0.92 -9.30 11.63 -3.66 3.67 -8.52 -16.04 -6.59 -6.65 -22.70
30: 1.77 -10.10 -4.82 -0.80 -9.49 -0.02 2.92 2.68 0.57 9.46
31: 2.05 3.63 -18.78 -14.14 -13.25 -0.76 7.68 -12.92 -6.34 -5.49
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 6.37 5.56 15.29 -2.27 -12.70 10.38 -12.16 -1.75 7.77 20.42
28: 12.27 -3.05 6.53 -2.49 -1.81 8.50 -6.89 -0.94 5.53 -2.64
29: 0.92 -9.30 11.63 -3.66 3.67 -8.51 -16.04 -6.59 -6.66 -22.70
30: 1.77 -10.10 -4.82 -0.80 -9.49 -0.02 2.92 2.68 0.57 9.47
31: 2.05 3.63 -18.78 -14.14 -13.25 -0.76 7.68 -12.92 -6.34 -5.49
32: 12.34 -0.76 0.13 -6.14 -1.73 12.58 -4.31 -6.64 0.50 -8.32
33: -0.29 5.68 7.32 -0.45 8.94 4.91 13.78 10.41 12.42 3.22
34: -9.61 -2.45 8.61 0.49 -3.19 10.91 10.11 2.96 11.29 -4.58
35: 0.61 -12.09 -3.50 2.18 -12.67 -3.36 -9.18 -3.66 -5.00 15.30
36: 12.01 15.94 7.31 0.38 -6.81 -1.55 -3.76 1.34 1.63 -9.23
TEST F16_F32_M m=128 n=110 k=622 batch=2 split_k=1 matmul 2.072ms avg_err=0.00245577
TEST F16_F32_L m=128 n=110 k=622 batch=2 split_k=1 matmul 1.603ms avg_err=9.40585
m = 64 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
59: 6.20 12.47 12.35 5.53 -8.56 6.60 2.22 -2.53 9.99 2.85
60: 4.03 -2.92 4.29 -3.98 -2.80 2.00 9.46 7.42 -2.43 -1.63
61: -12.63 -12.45 -1.42 4.81 1.49 -9.37 2.35 -15.00 -21.50 -7.77
62: 3.97 -4.34 8.39 0.55 -7.12 1.29 -2.76 -10.26 2.03 -8.55
63: -12.62 -7.98 9.08 3.05 -12.88 -4.25 9.30 -9.81 2.66 -9.37
64: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
65: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
66: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
67: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
68: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
59: 6.20 12.47 12.35 5.53 -8.56 6.60 2.22 -2.53 9.99 2.85
60: 4.03 -2.92 4.29 -3.98 -2.80 2.00 9.46 7.42 -2.43 -1.63
61: -12.63 -12.44 -1.42 4.81 1.49 -9.37 2.36 -15.00 -21.50 -7.77
62: 3.97 -4.35 8.39 0.55 -7.12 1.29 -2.76 -10.26 2.03 -8.55
63: -12.62 -7.98 9.08 3.05 -12.88 -4.25 9.30 -9.81 2.66 -9.37
64: -3.41 4.86 3.84 12.76 3.95 4.20 -14.95 -1.79 -4.34 0.03
65: -3.75 12.56 -3.16 6.26 1.21 -1.66 -10.86 8.88 -3.05 3.10
66: -3.60 -12.44 -5.72 -8.29 -7.19 2.64 3.27 -2.40 8.94 4.10
67: 5.99 -10.65 -12.93 -6.27 -9.95 -0.65 2.15 6.46 8.80 -8.18
68: 12.85 -0.28 -13.66 -5.66 -1.51 6.17 -0.17 -10.48 0.43 2.80
TEST F16_F32_S m=128 n=110 k=622 batch=2 split_k=4 matmul 1.508ms avg_err=10.9818
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 17.17 7.96 -0.42 3.74 4.48 -4.62 0.99 3.11 -1.07 4.82
28: -4.50 1.72 7.87 -8.35 1.67 -3.65 -6.64 7.56 -8.51 1.61
29: 12.29 -18.74 10.23 -15.72 -1.35 -1.18 16.46 -7.17 -7.71 8.86
30: 12.10 14.12 -13.22 13.34 -1.06 -10.03 3.79 -6.86 5.47 -22.58
31: 12.84 3.89 -4.24 0.80 -4.94 4.99 3.45 -6.00 -1.28 -16.47
32: -8.84 0.48 8.37 -1.37 -3.44 3.61 -0.52 2.62 4.27 9.41
33: 6.27 1.54 0.88 0.12 -5.51 0.18 -1.29 3.29 -1.24 -1.55
34: 3.93 -6.60 -1.65 4.69 8.99 1.10 -6.77 5.79 -2.26 5.27
35: 7.98 4.30 8.22 2.07 -6.22 -8.63 1.76 0.03 1.51 2.54
36: -1.17 0.59 1.68 -5.00 -3.89 7.32 -1.76 1.10 0.69 -7.15
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 17.17 7.96 -0.42 3.74 4.48 -4.61 0.99 3.11 -1.07 4.82
28: -4.50 1.72 7.88 -8.35 1.67 -3.65 -6.64 7.56 -8.51 1.60
29: 12.29 -18.74 10.24 -15.72 -1.35 -1.18 16.45 -7.17 -7.71 8.86
30: 12.10 14.12 -13.21 13.34 -1.06 -10.03 3.79 -6.86 5.47 -22.58
31: 12.84 3.89 -4.24 0.81 -4.94 4.99 3.46 -6.00 -1.27 -16.47
32: 9.23 -1.09 4.44 -4.68 -5.56 -2.96 5.00 -7.10 -3.68 0.82
33: -4.21 9.52 -2.17 3.39 -9.81 -5.20 -0.15 -11.36 -1.45 3.42
34: -10.62 3.40 10.39 5.27 -14.76 5.66 -7.55 -7.45 3.40 -2.53
35: -3.49 1.41 -2.74 9.66 0.80 -9.65 4.25 -16.93 -6.28 7.10
36: -15.34 -11.14 10.23 8.71 -2.94 13.27 12.12 3.35 5.28 6.13
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 2.51 -1.80 2.09 6.75 -3.29 -4.41 -1.19 -4.28 3.47 3.14
28: 4.05 4.14 -0.51 -0.01 -3.00 4.10 -3.83 -2.09 4.38 -8.12
29: 5.79 -2.93 0.50 -5.79 0.74 -6.04 5.41 -2.66 0.47 7.47
30: 12.30 2.66 -1.11 2.10 -4.94 0.59 2.17 -4.08 3.57 -10.55
31: 11.89 8.39 -3.98 0.56 -2.70 0.06 -3.53 -7.86 -2.04 -7.07
32: -2.78 -0.47 -0.86 -0.00 -1.90 -2.56 -1.75 -1.42 0.63 3.08
33: 0.27 0.55 1.82 5.34 -2.80 0.14 0.24 -2.37 3.12 1.43
34: 1.10 -3.15 -2.29 1.21 0.60 1.48 -3.61 -0.23 -2.36 2.51
35: -1.00 -0.02 1.01 1.14 -2.57 -1.73 -3.12 -2.42 2.50 2.27
36: 2.43 2.08 1.51 -0.26 3.33 7.65 -1.42 0.56 2.09 -4.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 5.56 0.64 -3.35 -2.41 0.62 1.83 4.41 2.64 -3.97 -1.81
28: -3.21 1.32 4.82 -4.18 1.28 -7.41 -4.72 -2.87 -8.94 2.36
29: -5.37 -10.09 2.54 -8.79 -8.76 8.25 8.01 -6.29 1.33 5.51
30: 1.83 3.94 -6.79 7.48 -2.03 -1.17 1.61 -2.96 2.95 -6.66
31: 0.33 6.73 -1.93 -0.05 2.13 2.62 7.24 -2.63 0.75 -7.28
32: 0.77 -0.44 11.16 0.34 -1.77 2.53 2.38 -0.98 4.00 6.67
33: 5.79 -3.46 1.99 -3.02 -0.87 1.69 0.38 1.26 -0.85 -1.68
34: 2.00 -2.19 0.58 1.35 4.15 0.12 -0.90 1.81 0.43 -0.60
35: 6.63 2.62 4.41 -1.99 -2.89 -6.89 6.22 2.02 -1.10 -1.94
36: -0.45 -1.10 0.69 -4.83 -8.39 -3.62 0.15 -0.86 0.34 -3.61
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 0.79 7.01 0.62 -0.04 8.33 -4.22 -7.41 2.97 2.99 -3.70
28: -6.07 -0.42 -1.88 -4.90 0.43 5.73 4.02 4.54 -5.74 -0.13
29: 5.80 2.91 4.67 -1.87 6.12 -0.97 3.05 0.85 -5.53 -7.08
30: 1.08 3.25 -0.33 4.38 -0.80 -7.94 0.21 5.27 -3.59 -8.17
31: 4.57 -2.13 -3.38 -2.74 -0.98 -0.37 4.69 -0.45 -2.99 -0.12
32: -3.77 0.00 0.00 -1.59 0.00 0.00 0.00 5.47 0.00 0.00
33: 1.69 0.00 0.00 -2.85 0.00 0.00 0.00 3.98 0.00 0.00
34: 0.60 0.00 0.00 1.90 0.00 0.00 0.00 -0.63 0.00 0.00
35: 0.24 0.00 0.00 1.21 0.00 0.00 0.00 0.06 0.00 0.00
36: -2.72 0.00 0.00 -1.18 0.00 2.11 0.00 -0.65 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 8.31 2.11 0.22 -0.56 -1.18 2.19 5.17 1.78 -3.56 7.19
28: 0.73 -3.32 5.44 0.74 2.96 -6.07 -2.13 7.99 1.79 7.50
29: 6.06 -8.63 2.53 0.73 0.55 -2.42 -0.01 0.93 -3.98 2.96
30: -3.11 4.27 -4.98 -0.62 6.71 -1.52 -0.20 -5.09 2.55 2.80
31: -3.94 -9.10 5.05 3.03 -3.39 2.69 -4.95 4.94 3.01 -1.99
32: -3.06 1.39 -1.92 -0.11 0.23 3.65 -1.15 -0.46 -0.36 -0.33
33: -1.48 4.45 -2.93 0.66 -1.84 -1.65 -1.91 0.42 -3.50 -1.30
34: 0.23 -1.27 0.06 0.23 4.24 -0.51 -2.26 4.84 -0.32 3.36
35: 2.11 1.71 2.80 1.71 -0.76 -0.01 -1.34 0.37 0.11 2.21
36: -0.43 -0.39 -0.52 1.27 1.17 1.17 -0.49 2.05 -1.74 0.46
TEST F16_F32_M m=128 n=110 k=622 batch=2 split_k=4 matmul 1.108ms avg_err=0.00245408
TEST F16_F32_L m=128 n=110 k=622 batch=2 split_k=4 matmul 0.727ms avg_err=15.5968
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 8.99 -10.38 -4.61 3.08 -4.01 -15.26 -9.21 -6.05 -3.93 -9.36
1: 15.65 -13.44 -2.30 -7.24 -4.78 -2.38 1.87 8.05 6.51 -5.15
2: 10.81 5.91 -8.05 -14.31 3.20 0.25 -12.01 -12.81 3.86 -15.31
3: -4.85 15.62 -4.90 8.69 -6.67 6.86 -9.88 2.72 2.80 -9.54
4: 3.60 -12.09 -0.78 7.70 -5.40 2.28 -4.33 1.73 -5.02 -9.38
5: 16.91 -4.73 -3.43 -8.84 -7.56 -6.92 7.50 -0.62 -17.74 -0.22
6: 6.24 -8.48 -0.20 6.88 3.64 -7.90 -4.34 4.20 2.00 14.21
7: -3.50 -2.74 -4.35 5.91 0.46 -5.35 1.00 7.88 20.35 -10.83
8: -4.22 -7.87 21.40 -1.93 5.13 -11.51 -3.15 12.93 -11.85 -11.51
9: 0.75 7.56 9.22 1.44 4.95 0.55 7.73 -1.14 7.93 -5.69
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 6.42 -7.12 0.16 -1.04 -16.67 -9.29 3.17 9.25 0.28 -11.40
1: 11.32 -18.18 17.85 -10.82 -5.51 -11.37 -7.23 3.74 -0.38 1.16
2: 0.71 -4.72 0.52 -12.68 3.13 1.71 -0.97 -9.34 8.64 -12.50
3: -3.68 3.95 1.92 -0.40 -8.59 14.77 -3.76 -5.11 3.69 -7.42
4: -8.81 -6.51 -4.48 -3.65 3.11 -2.00 -4.62 6.35 -4.02 0.12
5: -0.26 -0.57 3.51 1.85 -18.06 -1.09 3.57 6.48 -17.90 0.86
6: -8.66 -14.67 -3.35 2.74 9.00 -14.00 -1.71 -2.97 2.28 2.76
7: -14.48 -0.07 -1.20 -0.28 17.59 -5.48 14.20 0.35 17.40 -8.41
8: -1.35 -7.28 13.72 4.10 -8.97 -11.43 10.23 -7.39 -17.09 -7.08
9: 4.39 -4.29 4.70 -1.50 4.12 2.81 4.54 6.70 9.62 -13.61
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 0.34 0.73 -1.29 3.50 1.58 -4.02 -4.23 3.09 0.56 -1.41
1: 3.82 -2.32 1.39 -2.43 -4.53 -1.73 -2.64 -0.96 -3.18 4.00
2: -0.90 -2.27 0.49 -4.25 -0.03 -0.72 -1.52 -3.01 1.05 -6.08
3: 0.02 0.33 -0.68 4.67 -2.58 3.12 -1.80 -3.05 3.93 -7.06
4: -3.35 2.65 -1.60 -4.79 -4.33 -1.36 -4.14 6.73 0.20 0.50
5: -1.36 -2.26 -0.15 3.38 -3.97 4.52 1.66 3.61 -1.46 0.43
6: -2.92 -7.36 -0.38 4.21 3.34 -6.86 0.41 -1.05 0.29 6.78
7: -3.05 -3.04 -3.47 3.54 4.05 2.81 1.09 -5.03 7.61 -2.20
8: -3.72 -2.60 6.86 0.27 2.37 -0.34 5.62 3.80 -1.72 -2.91
9: -1.05 2.35 7.01 -0.28 5.56 5.89 4.97 4.03 5.89 -3.03
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 3.56 -5.93 0.23 -1.51 -5.54 -0.32 1.85 -3.95 3.55 -3.41
1: 1.54 -12.14 4.88 -2.95 -5.98 -1.18 -0.77 -0.02 3.09 0.53
2: 1.49 -1.41 2.14 -0.72 0.99 1.52 -1.47 3.43 0.51 -6.80
3: -1.39 4.37 1.21 1.25 1.84 7.60 -9.55 3.86 -0.50 -2.82
4: 5.93 -6.27 -0.15 1.12 0.34 7.83 -2.94 -3.24 -2.24 -3.69
5: 3.28 -3.33 -0.64 -0.40 -1.32 -0.03 -4.08 -1.32 -6.53 -0.99
6: 1.08 -5.13 0.26 1.68 4.75 -5.09 -0.67 1.22 2.80 3.80
7: -5.24 -1.44 2.81 2.20 2.97 -0.47 2.21 3.25 7.18 -5.69
8: -0.32 -0.86 0.86 -6.42 -1.86 -2.39 3.66 -2.71 -0.87 -1.80
9: 1.21 -0.08 -2.07 0.91 3.64 -0.22 2.52 1.97 4.96 -4.95
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 4.40 0.15 -0.75 4.08 -0.54 -2.56 -2.41 -2.81 -1.10 -4.51
1: 9.82 0.56 -5.79 -3.68 4.19 0.11 3.96 1.24 0.70 -1.92
2: 4.71 3.58 -0.92 -3.92 1.79 -1.05 -3.13 -7.90 2.99 -1.96
3: 1.45 4.44 -1.79 -2.74 -6.01 -4.90 0.14 7.52 -4.40 4.60
4: -4.84 -11.34 -2.26 5.27 5.81 -1.25 1.77 -6.71 -3.59 -7.82
5: 5.72 5.71 -5.82 -6.74 -0.21 -2.67 5.81 1.71 -2.43 -1.52
6: 4.05 6.47 -0.81 3.22 -0.59 6.52 -1.27 0.83 -7.73 3.14
7: -0.75 2.21 0.70 -4.48 -4.71 -2.98 0.51 5.03 0.96 -1.77
8: 1.49 -2.38 3.68 1.53 -1.23 -3.72 -7.91 2.81 -2.68 -2.28
9: -4.72 2.90 6.84 -0.42 -4.05 -3.17 -1.26 -4.65 0.14 0.66
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 0.69 -5.33 -2.81 -2.99 0.49 -8.37 -4.42 -2.38 -6.94 -0.03
1: 0.47 0.45 -2.78 1.83 1.54 0.42 1.31 7.79 5.90 -7.77
2: 5.51 6.01 -9.75 -5.43 0.45 0.49 -5.89 -5.32 -0.69 -0.47
3: -4.93 6.48 -3.64 5.50 0.08 1.04 1.32 -5.61 3.77 -4.26
4: 5.87 2.87 3.22 6.10 -7.22 -2.94 0.97 4.96 0.61 1.62
5: 9.27 -4.85 3.18 -5.08 -2.06 -8.74 4.10 -4.63 -7.32 1.86
6: 4.03 -2.46 0.73 -2.24 -3.87 -2.47 -2.80 3.20 6.64 0.48
7: 5.54 -0.47 -4.39 4.65 -1.85 -4.71 -2.81 4.64 4.61 -1.17
8: -1.66 -2.04 10.00 2.70 5.85 -5.07 -4.53 9.04 -6.58 -4.51
9: 5.31 2.39 -2.56 1.23 -0.19 -1.95 1.50 -2.49 -3.06 1.63
TEST F16_F32_S m=511 n=511 k=127 batch=2 split_k=1 matmul 2.922ms avg_err=4.49193
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -4.43 5.72 6.58 2.52 -2.06 1.36 4.32 -5.40 3.90 -4.01
28: -2.73 -0.92 2.24 6.94 1.87 -0.94 -0.42 3.82 -0.14 6.52
29: -10.07 1.87 6.94 -0.57 -4.97 -3.63 0.99 2.33 2.36 5.35
30: 0.68 -4.44 -0.48 -2.34 0.09 2.81 -3.19 -0.64 -1.33 -2.34
31: -6.76 5.11 -1.94 -0.08 4.59 0.15 -1.09 1.63 -7.06 2.09
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -4.43 5.73 6.58 2.52 -2.06 1.36 4.32 -5.40 3.90 -4.01
28: -2.73 -0.92 2.24 6.94 1.87 -0.94 -0.42 3.82 -0.14 6.52
29: -10.07 1.87 6.94 -0.57 -4.97 -3.63 0.99 2.33 2.36 5.35
30: 0.68 -4.44 -0.48 -2.34 0.09 2.81 -3.19 -0.64 -1.33 -2.34
31: -6.76 5.11 -1.94 -0.08 4.59 0.15 -1.09 1.63 -7.06 2.09
32: 5.89 -2.46 -2.50 0.34 2.87 -3.08 0.11 -4.64 -0.99 -5.77
33: -0.47 -2.61 -1.06 3.16 -5.72 3.63 0.80 3.51 6.06 0.73
34: -0.29 -3.47 -0.02 0.82 -1.03 -4.21 -3.46 4.58 -4.31 2.45
35: 2.68 -2.14 -2.40 -1.96 3.32 -3.02 0.89 -1.11 -4.39 0.47
36: -1.28 -4.00 2.98 0.20 3.42 0.47 1.65 1.58 2.69 -4.47
TEST F16_F32_M m=511 n=511 k=127 batch=2 split_k=1 matmul 2.813ms avg_err=0.00110745
TEST F16_F32_L m=511 n=511 k=127 batch=2 split_k=1 matmul 1.032ms avg_err=4.49074
m = 256 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
251: 2.43 -1.34 -0.26 1.27 6.22 3.02 -3.74 -1.47 1.88 0.74
252: -1.99 -3.21 0.23 1.08 -2.77 -1.94 -1.50 -7.02 7.14 -2.88
253: 1.10 -3.69 2.70 2.51 -3.36 -1.41 -6.61 -0.81 -6.70 6.44
254: 3.51 -1.33 0.49 6.56 -1.12 1.08 1.11 -1.29 -4.58 -5.34
255: -0.77 -2.85 1.56 5.73 -3.98 0.81 6.62 -7.00 -6.84 1.20
256: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
257: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
258: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
259: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
260: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
251: 2.43 -1.33 -0.26 1.27 6.22 3.02 -3.74 -1.46 1.88 0.74
252: -1.99 -3.21 0.24 1.08 -2.77 -1.94 -1.50 -7.02 7.14 -2.88
253: 1.10 -3.69 2.70 2.51 -3.36 -1.41 -6.61 -0.81 -6.71 6.44
254: 3.51 -1.33 0.48 6.56 -1.12 1.08 1.11 -1.28 -4.58 -5.34
255: -0.77 -2.84 1.56 5.74 -3.98 0.81 6.62 -7.00 -6.84 1.20
256: 2.52 7.32 6.15 -2.92 -0.07 5.32 1.72 2.07 -6.64 -6.66
257: 5.73 -3.39 3.76 -5.46 0.96 3.04 0.07 -1.87 3.70 -4.44
258: 4.40 -2.71 0.72 -1.54 2.42 -0.46 -8.71 2.45 -4.84 -3.66
259: 1.85 1.92 -0.15 -2.03 0.14 -4.90 -7.66 3.21 -2.71 0.12
260: 6.19 1.40 0.14 3.27 -4.12 -4.18 5.63 -1.94 0.22 2.59
TEST F16_F32_S m=511 n=511 k=127 batch=2 split_k=4 matmul 3.714ms avg_err=4.50468
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -2.48 -2.71 -0.14 -0.35 -2.37 5.33 1.56 -1.23 -7.76 -2.37
28: -5.57 -3.74 2.84 1.63 1.47 3.50 -1.70 -4.73 -3.77 -5.53
29: 1.53 0.13 9.07 -1.64 3.07 -7.53 -1.81 -1.16 2.02 -4.40
30: 5.96 -5.66 0.93 1.09 0.79 5.21 3.76 -3.17 -1.43 -4.91
31: -1.64 -8.93 7.63 -4.59 -10.65 5.67 4.25 2.79 1.11 -0.38
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -2.48 -2.71 -0.14 -0.35 -2.37 5.33 1.56 -1.22 -7.76 -2.37
28: -5.57 -3.74 2.84 1.63 1.47 3.50 -1.70 -4.73 -3.77 -5.53
29: 1.53 0.14 9.07 -1.64 3.07 -7.53 -1.81 -1.16 2.02 -4.40
30: 5.96 -5.66 0.93 1.09 0.79 5.21 3.76 -3.17 -1.43 -4.91
31: -1.63 -8.93 7.63 -4.59 -10.65 5.67 4.25 2.79 1.11 -0.38
32: 2.30 5.01 -4.33 -6.22 -5.34 2.89 -0.13 -1.68 1.98 5.75
33: 1.01 6.05 3.50 0.22 -5.91 6.11 0.74 -4.25 -6.52 -5.05
34: 4.74 -1.24 1.27 -2.71 -3.38 0.73 -1.01 3.77 0.05 2.17
35: -0.17 0.09 -5.91 1.43 -5.63 0.47 -0.15 -2.30 -0.13 3.42
36: 1.94 0.64 -3.65 -2.63 -4.84 -3.37 1.99 2.62 8.05 8.72
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -1.71 -1.48 0.79 -1.20 1.61 2.30 -0.68 -1.53 -1.41 -2.66
28: 0.04 -2.87 -0.64 -1.23 1.51 3.21 -0.60 -4.53 -3.15 -1.84
29: -0.56 -1.68 2.10 0.36 3.13 -0.23 -2.27 -0.66 -1.96 -0.42
30: -2.12 -4.23 -1.17 1.66 0.41 -1.29 1.94 0.78 -1.29 -4.01
31: 0.75 -0.47 -0.16 1.41 -2.87 1.72 0.61 1.90 4.07 0.89
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -1.64 1.11 0.93 0.37 -0.64 0.47 -1.10 -0.04 -2.41 0.83
28: 0.94 -0.04 0.92 -0.61 -1.87 -0.07 1.26 1.76 -0.15 0.13
29: 2.46 2.23 2.33 2.23 2.88 -1.67 0.76 -2.20 2.16 -3.05
30: 2.91 -1.81 3.68 2.55 -1.96 4.27 1.80 -1.59 -1.83 0.49
31: -2.38 -1.89 2.73 0.73 -1.96 1.80 -1.18 1.55 -4.08 -0.29
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 3.06 0.86 -1.68 0.96 1.46 0.55 2.70 -0.60 -4.00 -2.57
28: -2.80 0.78 -1.05 1.48 0.88 -0.89 -2.84 -1.52 -1.17 -4.10
29: -0.65 0.46 3.62 -0.08 -0.77 -2.38 1.19 0.46 0.24 -3.33
30: 2.00 1.83 -2.13 -2.47 -1.90 1.64 -2.02 1.22 -0.03 1.85
31: -0.77 -2.99 3.22 -1.80 -1.07 2.50 2.98 3.20 0.91 -1.33
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -2.19 -3.20 -0.17 -0.48 -4.80 2.00 0.64 0.95 0.06 2.03
28: -3.76 -1.62 3.61 1.99 0.95 1.25 0.49 -0.44 0.71 0.29
29: 0.28 -0.87 1.02 -4.14 -2.17 -3.26 -1.48 1.23 1.58 2.41
30: 3.17 -1.45 0.56 -0.66 4.24 0.60 2.03 -3.58 1.72 -3.24
31: 0.77 -3.58 1.84 -4.93 -4.75 -0.35 1.84 -3.85 0.21 0.35
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_M m=511 n=511 k=127 batch=2 split_k=4 matmul 3.012ms avg_err=0.00110588
TEST F16_F32_L m=511 n=511 k=127 batch=2 split_k=4 matmul 1.218ms avg_err=7.21041
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -0.46 -0.67 9.03 -0.22 0.15 1.01 1.68 -0.67 -3.44 10.35
1: 2.93 -2.99 1.14 4.09 2.85 1.43 4.78 5.50 1.95 2.64
2: -5.01 3.05 -0.06 2.26 0.36 -0.59 -4.28 -2.59 3.62 -1.62
3: -7.56 -3.15 2.70 -6.06 6.73 5.32 -5.60 4.33 1.87 -3.79
4: -4.14 2.24 -3.76 -3.80 3.63 5.67 -4.29 10.04 -1.68 0.27
5: -1.91 0.51 -3.05 -3.41 0.60 -9.50 -0.00 2.08 0.10 0.28
6: -4.25 4.49 0.18 5.47 -2.40 -3.62 -0.50 -1.72 4.70 -3.01
7: 1.08 1.48 1.86 1.49 -0.05 -2.78 0.82 -1.17 -5.45 -0.01
8: 4.84 -0.60 -0.39 2.87 1.33 -1.19 0.63 -0.62 -2.25 2.85
9: -5.68 3.28 -1.94 -5.36 4.66 1.47 -2.32 -2.20 4.02 -0.43
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -4.41 -6.46 3.64 2.42 -2.00 -2.69 2.27 3.27 1.73 5.96
1: 0.84 5.87 1.34 -0.21 6.07 -2.07 1.57 -3.81 -0.35 1.78
2: -1.25 3.95 -3.26 1.47 -1.90 -8.31 -3.62 0.39 4.54 3.82
3: -3.90 -4.03 5.85 -2.97 8.75 1.15 -7.71 4.50 5.44 -1.95
4: -1.65 -4.95 -0.49 -4.34 3.07 3.69 -3.03 3.12 -0.13 -8.12
5: -0.02 1.38 -0.98 -3.16 0.38 -1.03 0.72 -0.53 -1.36 -1.62
6: 3.14 4.88 2.97 4.38 -3.43 -4.02 0.33 0.37 1.05 -0.95
7: 2.78 -0.37 -4.00 -0.90 -3.37 -2.45 -0.19 2.97 -0.44 -0.02
8: -0.77 1.70 -0.93 1.59 2.51 2.28 -5.22 -2.20 3.30 2.23
9: 3.40 -0.01 1.01 -4.73 4.53 6.29 0.02 1.27 2.89 2.88
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -0.59 -1.17 4.64 -0.70 -2.63 0.91 2.08 -1.54 -0.60 4.36
1: 0.60 0.16 -2.79 -1.23 3.13 0.31 1.77 -0.30 -0.48 -1.44
2: -2.68 1.62 1.51 0.49 2.53 1.39 -1.84 -1.91 2.52 1.31
3: -0.84 -0.65 1.90 1.14 2.16 0.26 -3.04 2.02 0.85 -1.33
4: -0.91 -3.02 4.38 -0.32 0.25 0.73 0.91 1.46 -2.34 -0.21
5: -0.80 -0.80 0.43 -0.61 -0.73 -1.08 1.54 -0.13 -0.63 -2.23
6: 0.21 -0.16 -1.38 1.69 -1.45 -0.94 -2.06 1.29 2.16 0.15
7: 1.05 0.39 -0.60 2.86 2.33 -1.70 0.26 3.66 -2.97 -1.04
8: -0.09 -0.13 -0.41 1.68 -0.48 0.76 -2.21 -1.61 1.70 1.19
9: 0.35 2.00 1.21 -1.11 2.61 -0.37 -1.37 -1.03 1.34 -2.19
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -1.08 -0.23 1.64 0.67 2.04 -1.98 2.30 0.34 -1.99 3.40
1: 1.30 0.88 1.59 0.61 -0.71 0.73 -0.58 1.09 -1.24 1.18
2: 0.71 1.32 -1.81 1.31 -1.39 -1.79 0.20 -1.02 2.08 -0.55
3: -3.62 -1.79 2.22 -4.21 2.73 3.19 -4.12 1.41 0.38 -0.36
4: -1.44 2.51 -3.08 -2.03 2.02 1.32 -4.34 2.95 0.83 -2.89
5: -1.66 2.36 -1.52 0.15 2.53 -0.16 -0.61 2.01 0.98 -1.37
6: -0.46 3.48 1.10 3.14 -0.79 -1.21 -0.99 -0.71 0.12 -1.04
7: 0.50 1.72 -2.04 0.30 -1.21 -1.65 -1.19 0.55 0.63 -2.18
8: 0.88 0.05 -2.34 0.22 -0.04 -0.23 -0.24 -0.14 -1.14 0.80
9: 0.55 0.70 -1.36 -0.44 -0.28 2.69 0.33 -1.09 2.23 1.31
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -0.02 1.40 2.73 2.53 0.34 2.90 -1.34 1.46 -0.09 0.55
1: 0.07 -0.96 0.56 1.77 2.00 1.25 -0.95 2.60 1.70 1.44
2: -1.88 1.73 0.75 -0.20 0.96 0.05 -2.43 -1.50 -0.10 -1.40
3: 1.80 -0.10 0.45 -0.25 0.27 0.87 0.42 0.74 -0.61 -2.18
4: -2.97 1.38 -3.97 -0.42 -2.61 0.81 0.50 3.73 -0.79 1.13
5: -0.72 2.02 -2.62 -0.78 2.35 -4.46 -0.45 0.18 1.39 5.19
6: -3.70 1.43 0.84 -2.02 0.88 2.39 -0.21 -1.16 1.23 0.80
7: -2.52 1.26 3.67 -0.94 -1.49 -1.15 -0.60 -1.96 -3.73 -1.88
8: -0.05 0.36 1.80 -0.54 1.87 -0.02 2.96 0.34 -2.25 0.11
9: -1.76 -1.55 2.28 -0.45 2.08 -1.17 -0.42 -0.09 1.22 0.91
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 1.23 -0.67 0.02 -2.72 0.40 -0.82 -1.36 -0.94 -0.76 2.02
1: 0.97 -3.07 1.78 2.94 -1.57 -0.86 4.55 2.11 1.97 1.47
2: -1.16 -1.63 -0.51 0.66 -1.74 -0.24 -0.22 1.84 -0.87 -0.98
3: -4.90 -0.62 -1.87 -2.73 1.57 0.99 1.14 0.16 1.25 0.08
4: 1.18 1.38 -1.08 -1.02 3.97 2.81 -1.35 1.91 0.62 2.24
5: 1.27 -3.07 0.67 -2.17 -3.56 -3.80 -0.48 0.01 -1.64 -1.31
6: -0.30 -0.26 -0.37 2.67 -1.03 -3.86 2.76 -1.13 1.19 -2.91
7: 2.05 -1.88 0.84 -0.73 0.32 1.72 2.35 -3.42 0.63 5.09
8: 4.10 -0.87 0.56 1.50 -0.01 -1.70 0.12 0.79 -0.56 0.75
9: -4.81 2.13 -4.07 -3.37 0.26 0.32 -0.87 0.01 -0.76 -0.45
TEST F16_F32_S m=511 n=511 k=7 batch=2 split_k=1 matmul 1.004ms avg_err=1.06004
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -0.21 -1.10 -0.07 1.56 2.01 -0.35 -1.78 2.65 -0.57 -1.63
28: -0.14 0.97 1.14 1.58 -0.09 0.86 -0.98 0.23 -0.02 -0.74
29: -0.72 -0.65 0.77 0.95 1.49 -0.23 -1.84 1.21 -1.45 0.53
30: 0.18 1.73 1.01 -0.32 -1.96 1.02 0.93 -2.09 0.85 0.78
31: 0.88 1.93 -0.39 -1.04 -1.98 0.20 1.37 -2.08 1.73 0.53
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -0.21 -1.10 -0.07 1.56 2.01 -0.35 -1.78 2.65 -0.57 -1.63
28: -0.14 0.97 1.14 1.58 -0.09 0.86 -0.98 0.23 -0.02 -0.74
29: -0.72 -0.65 0.77 0.95 1.49 -0.23 -1.84 1.21 -1.45 0.53
30: 0.18 1.73 1.01 -0.32 -1.96 1.02 0.93 -2.09 0.85 0.78
31: 0.88 1.93 -0.39 -1.04 -1.98 0.20 1.37 -2.08 1.73 0.53
32: -0.39 -1.44 -1.57 -0.73 1.35 -1.20 0.02 0.65 -0.52 0.90
33: 0.50 0.37 0.06 -1.35 -1.86 1.09 1.19 -2.89 0.72 1.13
34: -0.45 0.30 -0.29 -0.22 0.22 -0.83 0.47 0.36 -0.25 0.70
35: 0.62 -0.06 -0.68 -0.51 -0.21 0.37 -0.44 -1.18 0.84 0.51
36: 0.45 -2.02 -2.50 -1.99 0.69 -1.01 0.76 -0.45 0.30 0.78
TEST F16_F32_M m=511 n=511 k=7 batch=2 split_k=1 matmul 0.692ms avg_err=2.91376e-08
TEST F16_F32_L m=511 n=511 k=7 batch=2 split_k=1 matmul 0.379ms avg_err=1.0297
m = 256 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
251: -0.90 0.17 0.52 1.05 -0.24 1.86 -0.02 0.69 0.28 0.86
252: -0.11 0.63 0.44 -1.81 1.10 -0.25 1.57 0.95 -0.10 -0.03
253: 0.15 -0.01 0.27 -0.37 -0.01 -0.46 -0.50 0.23 -0.63 0.05
254: -0.85 -0.97 0.47 0.86 -0.45 -0.97 -1.35 -0.55 0.47 -0.53
255: -0.43 -0.41 -0.25 -0.72 0.03 -1.07 0.01 0.61 -0.83 -0.14
256: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
257: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
258: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
259: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
260: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
251: -0.90 0.17 0.52 1.05 -0.24 1.86 -0.02 0.69 0.28 0.86
252: -0.11 0.63 0.44 -1.81 1.10 -0.25 1.57 0.95 -0.10 -0.03
253: 0.15 -0.01 0.27 -0.37 -0.01 -0.46 -0.50 0.23 -0.63 0.05
254: -0.85 -0.97 0.47 0.86 -0.45 -0.97 -1.35 -0.55 0.47 -0.53
255: -0.43 -0.41 -0.25 -0.72 0.03 -1.07 0.01 0.61 -0.83 -0.14
256: -0.45 0.67 0.32 0.56 -0.40 1.76 0.08 0.42 0.62 1.41
257: -0.59 0.82 1.57 -1.57 0.24 1.52 1.45 1.38 0.52 0.97
258: 0.02 -0.56 -1.26 1.41 -0.01 0.67 0.03 0.34 -1.35 -0.47
259: 0.09 0.71 0.62 -0.61 0.07 1.44 0.67 1.00 -0.39 0.87
260: 0.78 1.21 0.55 -0.23 1.11 0.88 0.47 0.06 0.16 0.25
TEST F16_F32_S m=511 n=511 k=7 batch=2 split_k=4 matmul 2.276ms avg_err=4.61357
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -0.94 -0.88 -1.02 -0.85 0.64 0.13 -0.31 -0.11 -0.92 0.58
28: -0.92 -0.09 0.12 -0.40 0.71 -0.73 0.12 -0.45 0.65 0.18
29: -1.30 0.20 -0.48 0.78 2.19 -0.05 0.31 -0.63 -0.09 -0.12
30: -0.56 0.15 -1.06 0.02 1.01 -0.23 0.69 -0.11 -1.01 -0.06
31: -1.11 -1.07 -0.60 -0.91 -0.14 -2.50 -0.47 -1.77 -0.83 -0.45
32: -6.26 1.56 5.77 1.57 1.46 3.00 -0.00 2.31 1.81 -2.84
33: -1.22 -1.11 -3.83 -5.88 -2.42 3.60 1.74 8.97 -1.83 0.51
34: 4.24 -4.09 8.54 -1.29 -0.15 2.33 1.43 -2.22 -4.22 -0.96
35: 1.65 -5.34 -5.26 0.70 -0.44 0.91 3.43 5.57 -6.93 3.91
36: -2.65 3.43 -5.59 5.33 1.58 -0.67 2.75 -1.28 2.66 -4.38
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -0.94 -0.88 -1.02 -0.85 0.64 0.13 -0.31 -0.11 -0.92 0.58
28: -0.92 -0.09 0.12 -0.40 0.71 -0.73 0.12 -0.45 0.65 0.18
29: -1.30 0.20 -0.48 0.78 2.19 -0.05 0.31 -0.63 -0.09 -0.12
30: -0.56 0.15 -1.06 0.02 1.01 -0.23 0.69 -0.11 -1.01 -0.06
31: -1.11 -1.07 -0.60 -0.91 -0.14 -2.50 -0.47 -1.77 -0.83 -0.45
32: 0.75 0.22 1.02 0.31 -1.51 -0.50 -0.07 0.37 0.73 -0.49
33: 0.23 -0.47 1.53 -1.06 -1.48 -1.03 -1.28 -0.03 2.09 0.45
34: 0.25 -0.19 -1.40 -0.26 -0.34 0.04 0.36 0.64 -1.90 -0.03
35: -0.75 0.13 1.47 1.14 0.84 0.50 -0.50 -0.06 1.92 -0.10
36: 1.06 0.31 1.60 -0.35 -1.46 -0.93 -0.85 -0.15 1.95 -0.01
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -0.71 -0.25 -0.42 0.41 0.71 -0.05 -0.03 -0.14 -0.52 -0.17
28: 0.20 0.13 0.63 -0.67 -0.37 -0.61 -0.35 -0.05 1.18 0.35
29: -0.92 -0.30 -0.34 0.31 0.86 -0.32 -0.18 -0.23 -0.26 -0.10
30: 0.13 0.04 0.00 0.00 -0.11 0.10 0.06 0.04 -0.05 -0.01
31: -0.13 0.03 0.60 -0.65 -0.09 -0.84 -0.49 -0.15 1.27 0.37
32: 0.45 2.57 2.92 -0.04 -0.41 3.15 -0.14 -0.27 1.43 -1.70
33: -0.39 -2.07 -1.39 -1.15 -0.08 1.32 -0.50 2.47 -0.62 3.68
34: 0.48 -1.21 2.98 0.85 -1.70 2.83 0.56 -0.01 1.07 1.79
35: -1.56 0.69 0.54 0.00 1.29 0.67 0.04 -2.48 -1.17 1.11
36: 0.46 0.07 0.99 1.33 0.60 -0.03 -1.91 -0.26 0.27 -1.50
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -0.31 -0.31 -0.29 -0.31 -0.16 0.01 0.02 0.43 -0.29 0.18
28: -0.19 -0.15 0.06 0.03 -0.01 -0.00 -0.03 -0.07 0.00 -0.04
29: 0.62 0.58 0.29 0.35 0.21 -0.01 0.01 -0.44 0.36 -0.17
30: 0.32 0.21 -0.32 -0.26 -0.08 0.01 0.08 0.45 -0.17 0.22
31: -1.00 -0.89 -0.22 -0.33 -0.24 0.01 -0.06 0.36 -0.39 0.12
32: 1.18 1.81 1.89 1.40 1.20 -0.36 -0.38 -0.20 0.96 -0.92
33: -0.98 -0.10 0.29 -0.10 -1.38 2.44 2.11 0.44 -0.05 -1.86
34: 2.53 -1.78 4.22 -3.62 -1.57 -0.02 -0.94 -3.27 -1.90 -3.40
35: 0.23 -3.47 -2.58 0.48 -0.18 -0.85 1.21 4.10 -1.47 1.80
36: -1.88 4.44 -0.48 1.43 -0.38 -1.57 3.48 -0.84 -0.76 -1.42
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -0.18 -0.26 -0.66 -0.45 0.38 -0.53 -0.02 -1.01 -0.80 -0.05
28: -0.71 -0.11 -0.26 -0.21 0.82 0.51 0.25 0.22 0.10 0.42
29: -0.84 -0.10 -0.22 -0.20 0.95 0.70 0.31 0.41 0.23 0.52
30: -0.73 -0.15 -0.35 -0.28 0.87 0.43 0.24 0.07 -0.03 0.41
31: 0.26 -0.25 -0.65 -0.41 -0.09 -1.00 -0.20 -1.40 -1.05 -0.35
32: -4.49 -0.67 2.44 -1.38 0.15 -1.01 0.42 0.34 -1.31 0.33
33: -0.73 0.60 -1.37 -2.57 -0.44 0.27 2.34 3.93 -1.20 -2.53
34: 0.26 -0.99 2.67 0.04 2.05 -1.46 0.08 1.74 -1.54 -1.05
35: 0.31 -2.37 -2.23 -1.28 -0.62 -2.74 1.40 4.52 -2.57 -0.45
36: 4.48 -2.48 -3.10 2.17 2.40 2.71 -0.58 -1.61 1.80 0.64
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 0.26 -0.05 0.35 -0.51 -0.29 0.69 -0.28 0.61 0.69 0.61
28: -0.23 0.04 -0.31 0.46 0.26 -0.62 0.25 -0.54 -0.62 -0.55
29: -0.16 0.03 -0.21 0.31 0.18 -0.43 0.17 -0.37 -0.42 -0.38
30: -0.28 0.05 -0.38 0.56 0.32 -0.77 0.31 -0.67 -0.76 -0.68
31: -0.24 0.05 -0.33 0.48 0.28 -0.66 0.27 -0.58 -0.66 -0.59
32: -3.41 -2.16 -1.49 1.59 0.52 1.22 0.10 2.44 0.72 -0.55
33: 0.87 0.46 -1.37 -2.07 -0.52 -0.43 -2.20 2.14 0.03 1.21
34: 0.97 -0.10 -1.32 1.44 1.07 0.98 1.73 -0.68 -1.84 1.70
35: 2.68 -0.18 -1.00 1.49 -0.92 3.83 0.79 -0.58 -1.71 1.45
36: -5.71 1.40 -3.00 0.40 -1.04 -1.78 1.77 1.43 1.34 -2.10
TEST F16_F32_M m=511 n=511 k=7 batch=2 split_k=4 matmul 2.009ms avg_err=2.60747e-08
TEST F16_F32_L m=511 n=511 k=7 batch=2 split_k=4 matmul 0.856ms avg_err=1.64159
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 0.20 1.10 -0.44 0.28 0.09 0.27 -0.38 0.34 1.61 1.83
1: -0.90 0.20 0.71 0.73 -0.21 -0.37 0.27 0.01 1.04 -0.16
2: 0.24 -1.14 0.55 -0.34 -0.34 -0.97 -0.45 0.36 1.09 -1.08
3: 0.91 0.25 -0.38 -1.02 0.37 -1.31 -1.15 -0.18 1.50 1.25
4: 0.14 -0.60 1.23 -0.17 -0.47 -0.95 -0.35 0.38 0.87 -0.84
5: -1.79 0.12 1.33 1.84 -1.45 1.58 2.04 0.93 0.15 -1.35
6: 0.99 0.72 -0.46 -0.27 -0.01 0.85 -0.32 0.43 -0.11 1.34
7: 0.64 0.01 0.68 -1.03 -0.30 -0.66 -0.13 -0.05 -0.77 -0.22
8: -0.99 0.38 -1.13 0.63 0.56 0.46 0.64 -0.82 -0.28 0.46
9: -1.10 -0.61 -0.14 1.10 -0.04 0.55 0.73 -0.09 0.04 -1.01
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 0.59 0.08 0.09 -0.70 0.64 0.77 -0.63 1.99 0.30 0.05
1: -0.33 0.21 -0.05 0.16 -0.42 -1.38 -0.21 -1.78 -0.07 0.67
2: 0.72 0.21 1.07 -0.49 0.74 0.77 0.06 0.30 0.47 -1.11
3: 1.22 1.40 -0.06 -1.07 0.92 -0.63 -1.22 -0.37 1.43 1.43
4: -0.01 -1.37 0.72 -0.35 -1.05 -0.92 -0.15 0.16 -0.78 -0.39
5: -0.68 0.05 1.20 0.65 0.09 0.67 1.77 -1.17 0.55 -2.04
6: 0.79 -1.15 -1.35 -0.19 -1.45 -0.50 -0.32 0.63 -0.54 1.39
7: 0.59 -1.10 -0.34 0.08 -1.39 -1.11 1.36 -0.52 0.93 0.25
8: -0.74 0.02 -1.55 1.14 -0.42 0.44 1.06 -0.09 -0.05 0.52
9: -0.21 -0.65 -0.79 1.17 -0.49 0.95 1.45 -0.34 -0.59 -0.78
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -0.01 0.48 -0.29 -0.09 0.29 -0.15 -0.34 0.03 0.28 0.58
1: 0.45 -0.06 0.24 -0.39 -0.00 -0.46 -0.33 -0.10 0.17 0.19
2: 0.06 -0.60 0.38 0.08 -0.36 0.15 0.39 -0.05 -0.33 -0.70
3: 0.30 0.72 -0.29 -0.42 0.46 -0.56 -0.76 -0.02 0.56 1.05
4: 0.81 -0.63 0.74 -0.58 -0.32 -0.64 -0.21 -0.22 -0.00 -0.29
5: -0.50 -0.70 0.18 0.59 -0.46 0.77 0.92 0.07 -0.64 -1.14
6: -0.05 0.04 -0.05 0.03 0.02 0.03 0.01 0.01 0.00 0.02
7: 0.52 0.34 0.04 -0.53 0.24 -0.67 -0.67 -0.10 0.44 0.71
8: -0.80 0.72 -0.80 0.56 0.38 0.60 0.14 0.22 0.06 0.41
9: -0.57 -0.58 0.08 0.63 -0.39 0.80 0.89 0.09 -0.61 -1.04
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 0.84 0.09 -0.24 -0.51 0.03 0.00 -0.53 0.29 0.26 0.59
1: -0.81 -0.09 0.23 0.49 -0.03 -0.00 0.51 -0.28 -0.24 -0.57
2: 0.45 -0.00 -0.06 -0.30 -0.06 -0.05 -0.22 0.12 0.20 0.28
3: 0.71 -0.06 -0.03 -0.50 -0.19 -0.15 -0.27 0.16 0.40 0.40
4: -0.32 -0.06 0.13 0.18 -0.05 -0.03 0.24 -0.13 -0.06 -0.25
5: -0.72 -0.42 0.64 0.25 -0.56 -0.38 0.91 -0.45 0.24 -0.74
6: 1.08 0.17 -0.37 -0.63 0.11 0.05 -0.75 0.40 0.26 0.79
7: -0.50 -0.43 0.61 0.10 -0.60 -0.41 0.82 -0.40 0.35 -0.61
8: -0.31 0.00 0.04 0.21 0.05 0.04 0.15 -0.08 -0.15 -0.20
9: -0.20 0.20 -0.23 0.24 0.34 0.25 -0.17 0.07 -0.36 0.02
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -0.57 0.80 0.19 0.91 -0.30 0.78 0.47 0.39 0.75 0.69
1: -0.49 0.57 0.32 0.66 -0.24 0.39 0.06 0.70 0.84 0.24
2: -0.16 -0.06 0.41 -0.05 -0.04 -0.44 -0.65 0.92 0.64 -0.62
3: -0.01 -0.05 0.08 -0.05 0.01 -0.12 -0.15 0.17 0.10 -0.16
4: -0.32 0.22 0.41 0.25 -0.13 -0.11 -0.39 0.90 0.78 -0.29
5: -0.66 0.83 0.36 0.94 -0.33 0.65 0.24 0.77 1.04 0.49
6: -0.09 0.26 -0.14 0.29 -0.07 0.43 0.44 -0.32 -0.08 0.49
7: 0.47 -0.61 -0.23 -0.69 0.24 -0.51 -0.22 -0.50 -0.71 -0.40
8: 0.18 -0.09 -0.28 -0.10 0.07 0.15 0.33 -0.62 -0.50 0.27
9: -0.23 0.25 0.18 0.29 -0.11 0.14 -0.02 0.39 0.43 0.06
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -0.06 -0.27 -0.10 -0.04 0.07 -0.36 0.02 -0.36 0.33 -0.03
1: -0.05 -0.23 -0.09 -0.03 0.06 -0.30 0.02 -0.30 0.27 -0.03
2: -0.10 -0.48 -0.18 -0.06 0.12 -0.63 0.04 -0.64 0.58 -0.05
3: -0.08 -0.37 -0.14 -0.05 0.09 -0.48 0.03 -0.49 0.44 -0.04
4: -0.03 -0.13 -0.05 -0.02 0.03 -0.17 0.01 -0.17 0.16 -0.01
5: 0.09 0.41 0.15 0.05 -0.10 0.54 -0.03 0.54 -0.49 0.05
6: 0.05 0.25 0.09 0.03 -0.06 0.33 -0.02 0.33 -0.30 0.03
7: 0.15 0.71 0.26 0.09 -0.18 0.93 -0.06 0.94 -0.85 0.08
8: -0.05 -0.25 -0.09 -0.03 0.06 -0.33 0.02 -0.33 0.30 -0.03
9: -0.10 -0.48 -0.18 -0.06 0.12 -0.63 0.04 -0.64 0.58 -0.05
TEST F16_F32_S m=511 n=511 k=17 batch=2 split_k=1 matmul 1.225ms avg_err=1.63055
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 2.63 -0.04 0.37 -0.72 1.68 -1.83 -0.60 -1.43 0.03 0.98
28: 0.08 -1.64 -2.14 -0.28 0.09 1.13 2.90 1.36 0.25 0.01
29: 1.59 2.98 -1.00 -0.80 1.46 -3.32 0.50 -1.17 -1.16 1.37
30: 0.88 -2.55 -0.10 0.17 -0.22 -0.61 -0.95 1.56 1.49 0.02
31: 1.37 0.60 1.53 1.39 0.40 0.15 -0.40 1.44 -0.16 1.04
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 2.63 -0.04 0.37 -0.72 1.68 -1.83 -0.60 -1.43 0.03 0.98
28: 0.08 -1.64 -2.14 -0.28 0.09 1.13 2.90 1.36 0.25 0.01
29: 1.59 2.98 -1.00 -0.80 1.46 -3.32 0.50 -1.17 -1.16 1.37
30: 0.88 -2.55 -0.10 0.17 -0.22 -0.61 -0.95 1.56 1.49 0.02
31: 1.37 0.60 1.53 1.39 0.40 0.15 -0.40 1.44 -0.16 1.04
32: 2.19 -0.77 0.75 0.60 0.51 0.98 0.02 3.38 2.63 0.12
33: -0.98 -0.17 -0.74 2.41 0.50 -1.06 0.39 0.08 -0.98 1.37
34: -0.18 -0.65 0.21 -0.66 1.08 -0.62 0.67 -2.21 0.02 -0.89
35: 1.81 -1.26 0.07 0.89 0.45 -0.24 -1.37 1.96 2.09 0.72
36: 1.93 -0.01 0.08 -0.98 -0.81 -0.33 1.82 -0.12 0.10 -0.56
TEST F16_F32_M m=511 n=511 k=17 batch=2 split_k=1 matmul 0.943ms avg_err=1.0642e-07
TEST F16_F32_L m=511 n=511 k=17 batch=2 split_k=1 matmul 0.465ms avg_err=1.63845
m = 256 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
251: -1.56 -0.42 -1.71 -2.52 0.61 1.27 -0.18 3.28 2.87 -1.02
252: 0.90 2.01 -1.49 0.80 0.22 0.02 2.56 3.15 0.85 -2.51
253: -0.59 -1.93 0.10 1.21 -1.60 -0.93 1.76 2.01 -1.18 -0.33
254: -0.21 0.08 1.58 0.34 0.99 -1.72 2.08 0.04 -1.00 -0.28
255: -1.92 -1.90 2.44 -0.42 0.72 0.23 0.72 -0.37 2.24 0.88
256: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
257: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
258: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
259: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
260: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
251: -1.56 -0.42 -1.71 -2.52 0.61 1.27 -0.18 3.28 2.87 -1.02
252: 0.90 2.01 -1.49 0.80 0.22 0.02 2.56 3.15 0.85 -2.51
253: -0.59 -1.93 0.10 1.21 -1.60 -0.93 1.76 2.01 -1.18 -0.33
254: -0.21 0.08 1.58 0.34 0.99 -1.72 2.08 0.04 -1.00 -0.28
255: -1.92 -1.90 2.44 -0.42 0.72 0.23 0.72 -0.37 2.24 0.88
256: -0.86 -2.43 2.16 0.98 -1.31 1.05 -3.06 -1.10 0.47 2.26
257: -1.97 1.25 -0.79 -1.55 -0.40 0.75 -2.41 0.45 4.45 1.18
258: -1.66 -3.15 -0.03 0.91 0.44 -0.64 -0.61 1.65 0.18 -1.54
259: -2.38 2.26 0.51 -0.09 -0.21 -1.98 0.78 2.47 0.32 0.69
260: -1.86 0.41 -0.00 -0.78 1.33 0.92 -3.78 -1.16 0.42 1.15
TEST F16_F32_S m=511 n=511 k=17 batch=2 split_k=4 matmul 2.33ms avg_err=1.95235
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -0.66 0.10 -3.11 -1.65 2.18 0.91 -1.25 0.71 0.53 0.35
28: -0.96 0.95 0.76 1.46 0.62 -0.71 1.39 1.57 1.48 1.83
29: 1.10 0.21 -0.37 -0.18 -2.72 -1.17 0.40 0.29 -0.62 -2.14
30: -0.80 0.15 2.24 0.93 -1.70 -0.62 -0.19 1.30 0.25 -1.32
31: 1.27 -1.76 3.04 2.22 -1.75 -0.55 1.17 -0.19 0.61 -0.53
32: 0.45 0.08 -0.76 -0.71 1.49 -1.14 -2.02 0.10 -0.20 0.14
33: -0.27 -0.68 0.55 0.45 -1.01 0.26 0.79 0.46 1.19 -0.68
34: 0.95 2.22 -1.01 0.03 0.50 1.58 -0.67 0.72 -0.25 2.85
35: -0.19 0.69 -0.74 0.17 1.49 0.59 -1.45 0.82 -2.02 -0.37
36: -0.08 -1.10 -0.19 0.09 0.23 0.12 -0.30 0.37 -0.67 -1.52
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -0.66 0.10 -3.11 -1.65 2.18 0.91 -1.25 0.71 0.53 0.35
28: -0.96 0.95 0.76 1.46 0.62 -0.71 1.39 1.57 1.48 1.83
29: 1.10 0.21 -0.37 -0.18 -2.72 -1.17 0.40 0.29 -0.62 -2.14
30: -0.80 0.15 2.24 0.93 -1.70 -0.62 -0.19 1.30 0.25 -1.32
31: 1.27 -1.76 3.04 2.22 -1.75 -0.55 1.17 -0.19 0.61 -0.53
32: 0.66 -1.33 -0.49 -2.90 -1.94 -1.21 0.52 -0.95 -0.28 -2.14
33: -1.46 -0.51 -0.29 0.05 0.42 0.96 -0.50 0.14 2.24 0.59
34: 0.11 1.87 -1.84 -3.64 2.60 -0.74 -0.68 0.57 1.34 -1.61
35: -0.90 0.42 -0.19 0.80 1.07 -0.87 0.16 0.82 2.18 1.30
36: -0.72 0.88 -1.05 0.36 0.04 0.58 0.94 0.63 0.45 0.43
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -1.32 -0.53 -1.38 -0.86 1.88 1.70 -0.11 0.06 -0.08 1.62
28: -0.49 -0.04 -0.33 0.89 0.81 0.58 -1.57 1.20 -0.55 0.86
29: 1.36 -0.14 0.91 0.38 -1.39 -1.06 1.65 -0.78 0.05 -1.45
30: 0.30 -0.04 0.92 0.08 -0.76 -0.93 -0.28 0.47 -0.60 -0.68
31: 0.90 -0.48 0.61 1.06 -0.44 -0.38 0.55 0.41 -0.45 -0.51
32: 0.43 0.40 -0.04 -0.47 0.27 -0.59 -0.64 -0.07 0.44 0.74
33: -0.50 -0.61 0.13 0.58 -0.41 0.74 0.86 0.08 -0.59 -1.04
34: 0.03 0.75 -0.43 -0.19 0.45 -0.29 -0.56 0.04 0.46 0.92
35: 0.10 0.13 -0.03 -0.11 0.09 -0.15 -0.17 -0.01 0.12 0.21
36: -0.45 -0.78 0.25 0.56 -0.50 0.74 0.93 0.05 -0.67 -1.20
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -0.11 0.95 -1.42 -0.35 1.31 -0.42 -1.01 0.10 0.23 -0.81
28: 0.36 0.67 0.34 -1.13 -0.58 -0.74 1.91 0.32 -0.33 -0.36
29: -0.78 0.06 0.09 -0.04 -0.64 0.03 -0.09 -0.62 0.42 0.44
30: -0.59 -0.40 0.92 0.02 -1.32 0.33 0.78 -0.47 -0.05 0.99
31: 0.06 -1.23 1.33 1.24 -1.70 0.50 0.38 -0.39 0.27 0.71
32: -0.20 0.59 -0.71 0.46 0.95 0.67 -0.69 0.30 -0.89 0.28
33: 0.72 -0.24 0.19 -0.61 -0.47 -0.35 -0.03 0.05 0.65 0.29
34: 1.03 0.33 -0.56 -0.51 0.36 0.23 -0.94 0.48 0.03 0.87
35: -0.72 0.77 -0.86 0.90 1.30 0.93 -0.68 0.27 -1.36 0.08
36: 0.26 0.37 -0.51 0.03 0.54 0.38 -0.63 0.30 -0.38 0.42
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 0.93 -0.36 -0.59 -0.81 -1.42 -0.42 -0.35 0.86 -0.08 -0.70
28: -0.50 0.20 0.21 0.93 -0.45 -0.67 0.65 0.70 1.42 0.81
29: 0.18 0.50 -0.79 0.36 0.29 0.06 -0.81 0.93 -0.00 -0.48
30: -0.45 0.93 0.33 0.99 0.57 0.29 -1.03 1.14 1.14 -1.32
31: 0.68 -0.11 0.48 -0.90 -0.51 -0.73 -0.28 0.49 -0.20 -1.25
32: 0.30 -0.59 0.12 -0.66 0.19 -0.80 -0.72 0.29 -0.13 -0.85
33: -0.43 0.46 0.33 0.52 -0.20 0.24 -0.06 0.72 0.79 0.10
34: -0.24 0.58 -0.23 0.65 -0.17 0.89 0.88 -0.55 -0.06 0.99
35: 0.35 -0.60 0.01 -0.67 0.20 -0.71 -0.57 0.05 -0.31 -0.71
36: 0.17 -0.42 0.16 -0.47 0.12 -0.64 -0.62 0.39 0.04 -0.71
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -0.17 0.05 0.27 0.38 0.41 0.04 0.21 -0.32 0.46 0.25
28: -0.33 0.12 0.54 0.77 0.84 0.11 0.40 -0.65 0.93 0.52
29: 0.35 -0.21 -0.58 -0.88 -0.97 -0.19 -0.35 0.76 -1.08 -0.65
30: -0.06 -0.34 0.07 -0.16 -0.19 -0.30 0.34 0.15 -0.23 -0.31
31: -0.38 0.06 0.62 0.83 0.90 0.06 0.51 -0.70 0.99 0.51
32: -0.07 -0.32 -0.12 -0.04 0.08 -0.42 0.03 -0.42 0.38 -0.04
33: -0.06 -0.29 -0.11 -0.04 0.07 -0.38 0.02 -0.38 0.35 -0.03
34: 0.12 0.57 0.21 0.07 -0.15 0.74 -0.05 0.75 -0.68 0.06
35: 0.08 0.39 0.15 0.05 -0.10 0.51 -0.03 0.52 -0.47 0.04
36: -0.06 -0.28 -0.10 -0.04 0.07 -0.37 0.02 -0.37 0.34 -0.03
TEST F16_F32_M m=511 n=511 k=17 batch=2 split_k=4 matmul 1.837ms avg_err=8.46656e-08
TEST F16_F32_L m=511 n=511 k=17 batch=2 split_k=4 matmul 0.88ms avg_err=2.53634
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -0.29 2.66 -0.60 -2.92 -1.08 1.91 -0.34 0.04 0.64 -0.65
1: -0.13 -1.67 -1.32 0.56 0.48 0.88 0.34 -1.86 -0.25 0.16
2: 1.33 1.72 -3.21 0.48 2.13 -0.28 -2.67 -0.60 2.28 0.56
3: 1.81 -0.02 0.22 -0.53 -0.19 3.94 1.68 -2.05 -2.60 1.11
4: 0.42 -0.15 -0.62 1.04 -0.58 1.80 0.80 -1.65 1.86 2.29
5: -0.90 0.27 -0.76 0.59 -0.74 -0.40 0.17 -1.50 -0.74 -0.25
6: -0.48 -1.42 -1.05 -2.04 -1.62 1.68 0.93 -0.78 2.14 3.39
7: 1.11 0.96 0.39 -0.22 -0.69 0.78 0.72 -0.70 0.77 2.17
8: -1.37 -2.48 -1.94 -0.42 1.46 1.55 -1.68 0.64 -0.11 2.01
9: -1.63 -2.46 -2.22 1.21 1.22 -1.57 -4.45 -1.41 0.90 0.38
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 0.79 2.33 -0.12 -4.24 -0.36 3.30 0.29 -1.56 0.04 -1.70
1: 1.44 -1.73 0.47 1.34 1.44 1.04 -0.20 -3.13 1.04 -0.76
2: 2.26 1.15 0.27 1.58 -0.33 -1.38 -0.15 -0.53 2.42 -0.11
3: -0.01 -1.11 -1.34 -0.78 0.07 4.27 0.09 -1.07 0.28 -0.13
4: -0.09 0.39 0.22 2.00 -0.01 0.04 0.85 -0.52 2.24 0.72
5: 0.10 2.50 -0.63 0.88 1.62 -0.95 0.53 -1.98 -0.52 -1.16
6: 0.32 0.52 -0.49 -0.92 -0.08 2.69 1.39 1.33 3.02 3.04
7: 1.69 -0.06 2.26 1.74 -1.91 -0.52 1.18 -0.48 2.38 0.18
8: -0.81 -2.94 -0.73 0.93 0.16 1.49 -0.59 0.64 -0.70 2.36
9: -1.12 -1.76 -0.90 -0.10 2.25 -0.48 -3.93 -1.33 1.18 1.23
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -0.20 0.63 1.21 -1.28 -1.27 0.74 1.91 -0.02 -0.21 -0.19
1: 0.26 -0.72 -0.35 0.15 0.74 0.41 -0.98 0.11 -0.12 0.29
2: 0.72 0.19 -1.25 1.31 0.49 0.21 -0.86 0.07 1.36 0.13
3: 0.60 -0.15 0.25 -0.66 -0.12 2.11 0.27 0.38 0.59 0.70
4: 0.02 0.23 0.08 -0.33 0.08 1.52 0.13 0.26 0.41 0.32
5: -1.13 0.24 -0.18 0.38 0.65 -1.02 -0.68 -0.13 -0.74 -0.41
6: 0.62 0.13 0.53 -0.66 -0.58 1.93 0.25 1.01 1.25 2.43
7: 1.39 -0.19 0.51 -0.65 -1.16 1.11 1.08 0.35 1.03 1.08
8: 0.02 -1.30 -0.74 0.68 1.45 -0.84 -2.10 0.02 -0.69 0.30
9: 0.18 -1.06 -1.17 0.98 1.59 -0.43 -1.93 -0.29 -0.44 -0.68
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 0.76 1.02 -0.59 -1.61 -1.09 1.24 -0.90 0.26 0.02 -0.64
1: 0.38 -0.35 -0.46 1.19 0.11 0.56 1.62 -1.51 -0.23 -0.08
2: 1.01 1.04 -0.28 -0.87 0.49 -0.82 -0.05 -0.27 0.30 -0.61
3: -0.02 -0.39 -0.34 0.88 -0.19 0.63 0.79 -1.04 0.09 0.15
4: -0.59 -0.56 -0.38 1.36 -0.31 0.27 0.01 -1.63 1.21 0.72
5: 0.60 0.49 -0.39 0.48 -0.26 0.31 0.48 -1.37 -0.24 -0.44
6: -0.36 -0.77 -0.46 -0.15 -0.03 0.64 0.19 0.17 1.26 0.71
7: -0.36 0.15 0.48 1.11 0.32 -1.21 -0.37 -0.83 -0.09 0.08
8: -0.68 -1.57 -0.30 0.18 -0.03 1.35 1.06 0.79 0.49 0.88
9: -1.20 -0.51 0.11 -0.05 -0.27 -0.35 -1.71 0.37 1.82 1.08
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -0.52 1.03 -1.22 0.03 1.36 -0.20 -1.20 -0.35 0.72 0.27
1: -0.77 -0.61 -0.59 -0.65 -0.42 -0.29 -0.23 -0.63 0.13 -0.04
2: -0.32 0.51 -1.65 0.03 1.18 0.36 -1.73 -0.40 0.58 1.07
3: 0.34 0.45 0.32 -0.98 -0.05 1.66 0.17 -0.88 -2.98 0.01
4: 0.19 0.13 -0.31 -0.20 -0.51 0.43 0.27 0.19 0.51 1.03
5: 0.14 -0.37 -0.02 -0.40 -0.92 0.45 0.50 0.03 0.00 0.74
6: -1.40 -0.79 -0.97 -1.61 -1.07 -0.22 0.07 -1.31 -0.19 0.05
7: -0.02 0.96 -0.67 -0.61 0.09 0.77 0.01 -0.29 -0.12 0.99
8: -0.60 0.33 -1.16 -0.89 -0.07 0.40 -0.42 -0.70 0.13 0.88
9: -0.47 -0.75 -0.70 -0.34 0.19 0.18 -1.05 -0.73 -0.67 -0.02
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -0.32 -0.03 -0.01 -0.06 -0.07 0.13 -0.15 0.16 0.11 -0.09
1: -0.00 0.03 0.09 -0.13 0.05 0.21 -0.06 0.17 -0.03 -0.01
2: -0.08 -0.01 -0.03 0.02 -0.03 -0.03 -0.02 -0.01 0.04 -0.02
3: 0.88 0.07 -0.00 0.23 0.18 -0.46 0.44 -0.50 -0.30 0.24
4: 0.80 0.06 -0.01 0.21 0.16 -0.42 0.40 -0.46 -0.27 0.22
5: -0.51 -0.09 -0.18 0.12 -0.21 -0.14 -0.13 -0.04 0.23 -0.13
6: 0.67 0.01 -0.15 0.38 0.05 -0.68 0.43 -0.65 -0.18 0.19
7: 0.10 0.03 0.07 -0.08 0.06 0.11 0.00 0.07 -0.06 0.02
8: -0.12 0.06 0.26 -0.39 0.12 0.64 -0.23 0.53 -0.04 -0.05
9: -0.14 -0.14 -0.46 0.61 -0.29 -0.97 0.24 -0.76 0.19 -0.01
TEST F16_F32_ALIGNED_S m=49 n=49 k=128 batch=2 split_k=1 matmul 0.48ms avg_err=3.48914
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 6.75 -7.22 -1.71 -6.20 1.22 -0.70 -0.60 4.62 3.70 -5.69
28: -2.03 -2.18 2.91 4.02 0.34 7.93 -6.52 -3.28 0.98 -0.53
29: 0.32 5.84 5.10 5.13 -7.29 -6.88 9.87 -6.14 4.97 5.10
30: 4.57 -6.28 3.72 0.35 6.92 -1.88 0.49 -2.16 -4.95 4.35
31: -2.16 8.23 3.34 3.55 5.66 -6.76 -4.73 7.73 -0.54 4.72
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 6.75 -7.22 -1.72 -6.20 1.22 -0.70 -0.60 4.62 3.70 -5.69
28: -2.03 -2.18 2.91 4.02 0.34 7.93 -6.52 -3.28 0.98 -0.53
29: 0.32 5.84 5.10 5.12 -7.29 -6.88 9.87 -6.14 4.97 5.10
30: 4.57 -6.28 3.72 0.35 6.92 -1.87 0.49 -2.16 -4.95 4.35
31: -2.16 8.23 3.34 3.55 5.66 -6.76 -4.73 7.73 -0.54 4.72
32: 5.96 -5.91 1.14 2.77 8.25 -3.89 -7.04 -1.75 0.82 -4.75
33: 0.38 -0.89 -0.36 0.13 -3.01 -0.49 -1.48 -0.39 2.60 -0.76
34: -3.90 4.71 -4.97 2.08 -2.53 12.14 4.91 -3.84 0.16 -1.45
35: -1.08 2.42 6.81 -2.22 -7.37 -1.44 1.41 0.72 2.69 -3.99
36: -2.84 -2.22 2.02 -2.67 0.37 -0.33 -0.32 -0.92 1.22 -5.29
TEST F16_F32_ALIGNED_M m=49 n=49 k=128 batch=2 split_k=1 matmul 0.445ms avg_err=0.00113637
TEST F16_F32_ALIGNED_L m=49 n=49 k=128 batch=2 split_k=1 matmul 0.458ms avg_err=0.00112262
TEST F16_F32_ALIGNED_S m=49 n=49 k=128 batch=2 split_k=4 matmul 0.34ms avg_err=3.65279
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 3.93 0.87 2.71 -2.50 6.72 3.07 -5.04 3.49 -0.20 0.34
28: 2.16 -3.54 -5.22 1.79 -2.77 -0.22 3.67 -1.50 5.13 -6.66
29: 0.05 -4.69 2.54 3.29 6.57 6.04 2.72 -1.53 -0.00 3.65
30: -4.71 -0.16 0.51 2.75 -8.53 -3.59 3.59 9.67 -11.47 -6.18
31: 5.45 -1.63 -11.55 -6.36 -1.03 -2.67 1.42 5.79 -0.30 -4.46
32: 1.73 -0.89 -0.63 1.38 -1.44 1.32 0.13 0.06 -2.13 -3.48
33: -2.14 -2.99 0.88 -1.26 0.15 2.39 0.89 -2.46 -0.29 -1.37
34: 1.32 1.80 -1.51 0.17 2.23 -0.51 -1.94 -1.34 -1.73 -0.32
35: -1.90 0.91 -2.63 2.09 -1.45 0.28 0.47 -1.37 -1.57 0.70
36: -0.23 1.41 0.84 -3.49 -2.04 -1.80 0.07 -1.39 -0.34 -3.88
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 3.93 0.87 2.71 -2.51 6.73 3.07 -5.04 3.49 -0.20 0.34
28: 2.16 -3.54 -5.22 1.78 -2.77 -0.22 3.67 -1.50 5.13 -6.66
29: 0.05 -4.70 2.54 3.29 6.57 6.04 2.72 -1.53 -0.00 3.65
30: -4.71 -0.16 0.51 2.75 -8.53 -3.60 3.59 9.67 -11.47 -6.17
31: 5.44 -1.63 -11.55 -6.36 -1.03 -2.67 1.42 5.79 -0.30 -4.46
32: 0.07 9.12 1.36 -0.45 -2.58 -6.44 -1.76 -0.97 3.85 2.89
33: 2.56 -6.56 6.75 -1.63 8.00 4.97 1.33 1.25 -3.95 -0.90
34: 3.35 -3.17 4.12 0.81 1.06 -0.60 0.12 -3.02 -0.98 -0.94
35: 4.43 2.50 -5.37 1.36 -5.03 0.19 2.10 -5.76 -2.71 1.91
36: -1.44 1.39 -1.68 3.50 0.49 2.52 4.13 1.51 -6.19 0.72
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 2.90 -3.60 3.39 -1.54 -0.58 1.83 -2.39 1.32 2.04 1.27
28: 2.16 -0.71 0.63 -1.48 -0.76 -2.73 -0.42 -1.99 1.10 -1.24
29: 3.31 0.00 2.41 0.56 3.39 1.17 -0.08 0.96 -0.15 3.76
30: 1.34 0.14 0.27 0.93 1.29 0.49 -1.90 2.07 -3.58 -3.58
31: 4.25 -3.11 -3.85 -2.26 -0.41 -3.67 1.30 1.90 0.08 -1.70
32: 0.25 -0.06 -1.12 0.61 -0.88 1.01 0.92 0.19 0.21 -0.54
33: -0.44 -1.11 0.13 -0.85 0.49 0.50 -0.13 0.09 -0.87 -0.92
34: 0.11 0.55 -0.13 0.79 1.35 0.15 -0.46 -0.28 -0.45 -0.51
35: -0.13 -0.88 -0.90 -0.33 -0.99 0.50 0.02 -0.77 0.34 0.02
36: 1.10 -0.20 -0.22 -0.77 -0.02 -0.40 0.32 -0.01 0.74 -0.52
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 1.45 -1.15 -0.89 -1.40 2.80 0.92 -0.22 0.73 -1.28 3.08
28: 0.58 -0.28 -5.27 -0.23 -1.69 -0.52 0.95 -1.07 1.26 -1.14
29: -1.80 -2.57 0.14 -0.61 -2.28 0.99 -3.62 -3.24 1.62 1.04
30: -1.69 -0.07 0.74 0.03 -3.99 1.12 2.76 0.79 0.56 -1.98
31: 0.68 0.30 -0.40 2.46 -0.73 -0.12 -1.63 0.11 -2.83 -0.82
32: 1.19 1.00 0.03 0.88 -1.51 -0.81 -0.83 0.17 -0.27 -1.78
33: -1.36 -0.83 0.14 1.25 -0.49 -0.17 0.87 -1.00 -0.25 -0.90
34: -1.03 1.28 -1.82 -0.48 -0.37 0.06 0.37 -0.39 0.24 -1.93
35: -0.98 2.07 -1.61 1.85 -0.36 -0.55 0.41 -0.08 -0.44 0.63
36: -1.32 0.40 0.17 -1.20 -1.04 -2.01 0.08 -1.27 -0.99 -1.38
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -0.81 4.43 0.68 -0.27 1.85 -0.18 -2.96 -0.34 0.43 -0.79
28: -0.60 -0.88 -1.18 1.42 -3.07 2.13 2.75 -0.23 1.02 -0.80
29: -2.10 -2.73 -1.85 3.07 4.28 3.41 3.60 1.90 1.04 -2.36
30: -4.13 -0.30 -1.11 -2.22 -0.01 -0.83 3.00 3.72 -4.68 -1.20
31: -0.29 -3.60 -3.21 -3.83 0.35 -0.42 1.90 3.25 0.56 -0.03
32: 0.54 -0.85 0.84 0.13 0.74 0.17 -0.44 -0.44 -2.20 0.23
33: -0.92 -0.84 0.74 -0.95 -0.02 0.85 -0.45 -2.24 -0.06 0.20
34: 1.95 0.53 -0.31 -0.15 0.85 0.17 -0.75 -0.30 -0.43 0.95
35: -0.84 0.11 -0.05 0.48 -0.45 0.73 -1.59 -1.18 -1.30 0.37
36: 1.19 1.10 0.29 -0.17 -1.15 1.52 -0.40 -0.10 -0.90 -1.66
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 0.39 1.19 -0.48 0.70 2.65 0.50 0.52 1.78 -1.40 -3.22
28: 0.02 -1.67 0.59 2.07 2.76 0.91 0.39 1.79 1.76 -3.48
29: 0.64 0.60 1.84 0.27 1.17 0.47 2.82 -1.16 -2.51 1.21
30: -0.22 0.07 0.62 4.01 -5.82 -4.38 -0.27 3.09 -3.76 0.59
31: 0.80 4.78 -4.09 -2.72 -0.24 1.54 -0.16 0.53 1.90 -1.92
32: -0.25 -0.98 -0.38 -0.25 0.21 0.95 0.48 0.14 0.13 -1.39
33: 0.58 -0.21 -0.13 -0.71 0.18 1.21 0.60 0.69 0.89 0.25
34: 0.29 -0.56 0.75 0.00 0.41 -0.89 -1.10 -0.38 -1.08 1.18
35: 0.05 -0.39 -0.07 0.09 0.35 -0.40 1.63 0.66 -0.15 -0.31
36: -1.20 0.10 0.59 -1.35 0.17 -0.92 0.07 -0.01 0.82 -0.33
TEST F16_F32_ALIGNED_M m=49 n=49 k=128 batch=2 split_k=4 matmul 0.326ms avg_err=0.00110642
TEST F16_F32_ALIGNED_L m=49 n=49 k=128 batch=2 split_k=4 matmul 0.288ms avg_err=0.00108185
TEST F16_F32_S m=128 n=49 k=49 batch=2 split_k=1 matmul 0.493ms avg_err=2.52817
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -0.09 -0.95 -0.62 -1.67 2.91 -1.68 -2.44 0.09 -3.37 1.03
28: 1.34 -0.54 -1.07 -4.81 -4.77 0.20 0.74 3.33 -1.90 -1.07
29: 0.41 2.29 1.60 0.36 4.94 -0.99 0.03 -0.50 -0.77 0.47
30: 3.43 3.67 3.24 5.56 -1.17 -0.28 0.54 -1.57 2.67 1.61
31: -2.88 -2.99 -0.44 -2.28 -2.12 -4.97 -3.06 -2.60 -3.91 -1.85
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -0.09 -0.95 -0.62 -1.67 2.91 -1.68 -2.44 0.09 -3.37 1.03
28: 1.34 -0.54 -1.07 -4.81 -4.77 0.20 0.74 3.33 -1.90 -1.07
29: 0.41 2.29 1.60 0.36 4.94 -0.99 0.03 -0.50 -0.77 0.47
30: 3.43 3.67 3.24 5.56 -1.17 -0.28 0.54 -1.57 2.67 1.61
31: -2.88 -2.99 -0.44 -2.28 -2.12 -4.97 -3.06 -2.60 -3.92 -1.85
32: -2.83 2.15 3.99 4.05 0.04 2.05 0.41 1.10 4.30 2.55
33: -1.99 3.76 4.19 -0.76 3.27 2.18 6.74 3.23 0.36 -0.89
34: 4.79 -1.18 -3.70 -3.62 -5.49 -0.55 1.31 0.33 2.10 0.02
35: -3.31 -0.84 -0.17 4.17 1.24 -1.18 1.72 -0.49 1.43 1.76
36: -3.30 -0.91 -1.27 -1.91 2.19 -1.58 0.50 -0.65 -0.78 1.18
TEST F16_F32_M m=128 n=49 k=49 batch=2 split_k=1 matmul 0.343ms avg_err=0.000688992
TEST F16_F32_L m=128 n=49 k=49 batch=2 split_k=1 matmul 0.356ms avg_err=1.88832
m = 64 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
59: -6.16 1.04 3.13 2.82 -0.75 -0.12 5.52 -0.14 -1.47 -0.17
60: 0.06 -1.41 -0.62 1.62 3.21 0.70 0.49 2.93 -0.01 -0.45
61: -1.28 -1.03 -1.98 0.24 -1.73 -1.98 -0.42 -0.07 2.61 4.00
62: 2.10 -0.56 -0.19 -1.59 -0.87 -0.78 -3.28 2.53 -0.62 0.16
63: 2.20 -0.08 -0.94 2.46 1.10 0.24 -2.28 0.23 -0.04 -0.14
64: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
65: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
66: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
67: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
68: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
59: -6.16 1.04 3.13 2.82 -0.75 -0.12 5.52 -0.14 -1.47 -0.17
60: 0.06 -1.41 -0.62 1.62 3.21 0.70 0.49 2.93 -0.01 -0.45
61: -1.28 -1.03 -1.98 0.24 -1.73 -1.98 -0.42 -0.07 2.61 4.00
62: 2.10 -0.56 -0.19 -1.60 -0.87 -0.79 -3.28 2.53 -0.62 0.16
63: 2.20 -0.08 -0.94 2.46 1.10 0.24 -2.28 0.23 -0.04 -0.14
64: -0.71 0.39 -2.69 2.58 2.03 6.61 2.38 1.12 2.20 -0.19
65: 1.88 -1.28 -4.34 -1.81 -1.39 3.25 -3.62 -4.54 1.57 0.87
66: 2.46 0.76 -0.94 -1.70 3.35 0.48 0.54 2.27 -4.67 -4.39
67: -2.00 0.47 -0.65 3.38 -0.04 3.72 3.29 1.01 0.05 -4.05
68: -0.91 1.66 0.36 2.33 -1.35 2.00 1.63 3.56 -2.48 -7.88
TEST F16_F32_S m=128 n=49 k=49 batch=2 split_k=4 matmul 0.301ms avg_err=3.6878
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -2.57 -0.75 -0.49 -2.20 -1.73 1.59 -2.08 1.07 1.30 0.44
28: 3.70 -3.09 3.09 2.76 1.76 -1.57 0.29 1.55 0.00 0.94
29: -4.94 -0.36 -1.00 -1.74 3.86 -1.33 -2.19 0.66 -0.99 3.30
30: -0.01 1.23 -1.90 -0.76 -0.61 -0.60 -5.56 -0.12 -1.68 1.29
31: 1.12 0.54 -2.52 2.18 1.98 -1.13 -0.28 -2.98 2.39 1.11
32: 4.30 0.33 1.26 -2.21 -4.84 -0.93 1.08 -4.51 0.74 2.25
33: -5.13 3.89 -0.64 2.01 1.08 3.34 -2.24 -2.61 -5.76 0.93
34: 3.44 -2.78 1.58 -0.24 1.06 -1.52 -0.10 -1.27 -2.19 0.31
35: 2.82 1.06 -4.49 -4.07 1.72 5.13 1.57 1.61 3.81 -2.77
36: 1.86 -0.58 0.80 3.95 -5.75 -0.07 -1.56 1.52 -1.05 5.79
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -2.57 -0.75 -0.49 -2.20 -1.73 1.59 -2.08 1.07 1.30 0.44
28: 3.70 -3.09 3.08 2.76 1.76 -1.57 0.29 1.55 0.00 0.94
29: -4.94 -0.36 -1.00 -1.74 3.86 -1.33 -2.19 0.66 -0.99 3.30
30: -0.01 1.23 -1.90 -0.76 -0.61 -0.60 -5.56 -0.12 -1.68 1.29
31: 1.12 0.53 -2.52 2.18 1.98 -1.13 -0.28 -2.98 2.39 1.12
32: 1.78 -3.42 2.79 1.19 -1.37 1.30 1.41 -3.10 0.50 -3.14
33: -2.40 -1.54 -1.01 -0.27 -1.56 -0.35 -1.39 1.33 0.24 0.34
34: -4.25 -0.32 -3.43 1.23 -2.83 -2.75 4.40 1.09 1.40 2.41
35: -2.53 3.78 -5.16 -0.51 -1.08 -1.80 0.29 2.77 -0.24 -2.42
36: 3.30 -0.58 3.95 1.05 1.11 0.01 1.23 1.95 2.39 -0.15
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 0.03 -0.47 -2.01 -1.33 -2.35 0.36 -2.40 0.71 -0.82 0.45
28: -0.93 0.10 1.23 -0.29 -0.08 0.30 0.46 -0.84 0.57 0.90
29: 0.57 0.57 -0.14 1.39 -0.08 -1.07 0.68 -0.58 -0.75 2.52
30: 1.71 1.66 -1.38 -0.78 -0.64 -0.35 -1.69 0.45 -1.09 -0.01
31: -0.44 0.29 -0.86 0.93 -0.38 -1.11 0.88 -1.20 -1.17 1.89
32: 2.04 -0.07 -0.24 -0.98 -3.09 3.35 -0.53 -0.83 -0.11 2.04
33: -2.41 2.03 -2.05 0.31 3.66 2.40 -0.68 -3.58 -2.98 0.64
34: 1.12 -0.38 0.16 -1.25 -0.55 -2.78 -1.41 1.73 -0.27 -1.05
35: 1.90 1.23 -1.60 -4.49 3.15 2.83 -0.67 0.96 3.09 -1.25
36: 0.52 0.73 0.87 0.01 -1.99 -0.28 -0.41 -1.86 0.42 4.19
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -2.01 2.05 0.43 -0.44 0.54 1.25 0.54 -2.19 0.37 0.88
28: 0.47 -2.81 -0.72 1.12 0.21 -1.69 -0.76 2.14 -0.19 -1.10
29: -1.07 1.14 -1.63 -0.40 0.91 0.20 -2.04 1.02 0.67 -0.76
30: -0.60 0.86 1.93 -0.17 -0.51 -0.03 -0.11 -0.46 -1.55 -0.05
31: -0.06 1.75 -0.99 -0.54 0.34 -0.07 -0.59 0.09 1.56 -0.22
32: 2.88 0.26 3.13 -1.58 -0.68 -3.77 2.27 -1.18 0.77 0.04
33: -2.96 1.96 1.01 1.30 -1.29 2.16 -0.75 1.27 -0.90 0.48
34: 1.97 -1.94 -0.13 1.82 2.30 1.79 0.62 -2.53 -0.62 -1.68
35: 0.74 -0.05 -2.49 0.63 -0.18 1.39 1.62 -0.63 0.16 -0.96
36: 0.50 -1.03 -0.94 3.60 -3.43 -0.22 0.98 -0.67 -0.08 0.24
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 0.58 -1.35 1.61 -0.44 -0.57 -0.40 0.25 1.51 1.38 -1.49
28: 4.46 -1.23 1.75 0.36 2.22 1.68 -0.13 -0.38 0.15 -0.21
29: -2.06 -1.37 0.85 -0.84 0.61 -0.88 -1.29 0.27 -1.25 1.10
30: -1.60 -0.63 -1.09 -0.82 1.53 0.81 -0.97 -0.61 -0.16 1.46
31: 1.31 -0.61 1.29 -0.39 1.20 1.35 -1.14 -1.44 1.13 -0.78
32: -0.30 0.00 -0.93 0.00 -0.76 0.20 -0.67 -1.89 -0.71 -0.14
33: 0.42 0.00 0.21 0.00 -0.66 -0.36 0.02 -0.99 -0.90 -0.23
34: 0.83 0.00 0.92 0.00 -1.28 -0.62 0.78 -0.32 -1.14 1.67
35: -0.18 0.00 -0.51 0.00 -0.67 1.73 1.83 0.56 -0.41 -0.15
36: 0.98 0.00 0.15 0.00 -0.10 -0.21 -0.61 2.03 -0.88 0.53
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -1.17 -0.97 -0.52 0.01 0.66 0.37 -0.48 1.05 0.36 0.60
28: -0.30 0.85 0.83 1.57 -0.58 -1.85 0.72 0.62 -0.52 1.35
29: -2.38 -0.69 -0.08 -1.90 2.42 0.42 0.46 -0.05 0.34 0.43
30: 0.47 -0.65 -1.36 1.02 -0.99 -1.03 -2.79 0.51 1.13 -0.11
31: 0.32 -0.89 -1.95 2.18 0.81 -1.30 0.57 -0.43 0.87 0.23
32: -0.32 0.13 -0.70 0.36 -0.31 -0.72 0.02 -0.60 0.79 0.30
33: -0.17 -0.10 0.19 0.40 -0.63 -0.86 -0.83 0.69 -0.97 0.04
34: -0.48 -0.46 0.63 -0.81 0.58 0.09 -0.09 -0.16 -0.15 1.37
35: 0.36 -0.12 0.11 -0.21 -0.58 -0.82 -1.20 0.72 0.97 -0.40
36: -0.15 -0.28 0.72 0.35 -0.23 0.63 -1.51 2.01 -0.50 0.83
TEST F16_F32_M m=128 n=49 k=49 batch=2 split_k=4 matmul 0.307ms avg_err=0.000699054
TEST F16_F32_L m=128 n=49 k=49 batch=2 split_k=4 matmul 0.283ms avg_err=3.60422
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -3.06 -1.00 2.20 -2.53 0.75 -5.20 2.13 1.98 -0.56 0.24
1: -2.42 3.40 2.04 -0.02 -2.58 2.46 -3.02 -2.53 2.03 1.73
2: -2.66 1.06 -2.44 -1.27 -0.18 -2.05 -0.34 -2.95 2.63 0.65
3: -1.65 0.34 -0.19 2.24 -2.46 -0.82 0.50 0.31 3.12 -0.96
4: -1.05 -5.57 2.83 -0.98 1.49 0.42 4.92 1.61 0.73 -0.20
5: 1.75 -3.85 -2.60 0.45 0.52 -0.07 1.94 1.74 -0.43 -2.76
6: -0.38 -0.02 -3.60 -2.41 4.12 3.46 1.61 -1.36 0.04 -0.16
7: -5.32 2.67 2.13 -4.81 2.30 -3.83 2.11 1.42 1.93 1.03
8: 1.20 1.45 -0.34 1.67 -0.74 4.48 -1.70 2.04 2.47 -5.94
9: 2.20 -1.28 -1.60 -2.70 -0.30 -1.00 -1.85 -0.42 2.31 -3.94
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 0.98 -0.82 4.70 -1.62 2.53 -2.08 0.92 2.01 1.79 -2.07
1: -2.37 2.33 3.03 0.72 -0.58 3.45 3.41 -4.85 3.19 1.57
2: -3.67 2.58 -3.66 -3.06 -2.72 -1.87 0.05 -2.12 2.20 2.33
3: -1.94 1.40 4.24 1.68 -1.42 -2.92 2.54 1.75 -0.92 -1.32
4: -2.23 -4.65 4.28 -5.03 0.68 -0.48 0.54 0.65 3.73 0.38
5: -1.53 -3.90 -0.67 -0.31 -4.56 -1.24 0.41 1.00 2.51 -1.29
6: -1.40 -1.26 -2.13 -3.59 2.41 2.41 1.18 0.39 0.24 2.48
7: -1.62 1.35 3.85 -5.18 0.80 -1.93 2.85 1.81 0.47 -1.36
8: -0.38 4.77 2.74 2.65 -1.21 4.02 -0.48 0.21 3.18 -1.68
9: 1.18 -1.23 -2.52 -0.04 -0.44 0.10 -2.38 1.42 0.84 -1.17
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 1.51 -1.18 1.24 -1.20 0.19 -0.63 0.56 0.78 0.07 -0.58
1: -0.40 2.42 1.87 -0.83 -1.33 0.81 0.67 -1.22 0.28 1.27
2: -1.09 -0.63 -0.26 -2.10 -0.03 -0.57 -0.04 -2.48 -0.83 -0.49
3: -0.44 1.66 0.84 0.52 -1.45 0.62 -0.60 -1.29 0.26 1.50
4: -0.53 -2.76 1.18 -0.80 0.08 0.03 0.59 1.69 1.24 -0.51
5: 0.24 -1.13 -0.43 -0.80 -0.69 -0.27 -0.52 0.57 2.07 -0.43
6: -1.18 -0.76 -0.15 -1.46 0.92 1.16 0.82 0.67 -1.01 -1.13
7: 0.25 1.39 1.53 -1.12 0.67 0.93 1.60 1.10 -1.82 -0.17
8: 0.01 0.94 1.81 -1.22 0.23 2.64 -0.55 1.89 1.62 0.43
9: 1.11 -0.40 0.04 -1.10 -0.36 -0.75 -0.37 -0.22 0.52 -1.63
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -1.67 -0.38 1.30 -1.30 0.75 -0.80 0.35 0.65 0.52 0.47
1: -1.48 -1.42 0.60 0.65 0.45 0.27 0.04 -1.50 2.28 0.24
2: -0.04 0.30 -1.80 1.31 -0.57 -0.65 0.18 -0.38 1.51 -0.38
3: -0.96 -2.75 0.03 -0.31 0.05 -2.05 -0.11 1.24 0.98 -1.22
4: -1.42 -1.61 1.07 -2.28 1.52 -0.03 1.00 -0.22 0.31 0.55
5: -1.23 -2.09 -0.60 -0.48 0.78 -0.74 1.22 0.61 0.41 -0.42
6: -0.11 0.17 -0.40 -1.13 1.37 1.48 0.76 -1.24 1.56 1.85
7: -2.67 -0.15 1.67 -1.92 0.18 -1.26 1.21 0.98 0.76 -0.24
8: -0.63 1.94 1.39 1.87 -1.83 1.57 -1.48 -0.40 1.10 -2.39
9: 1.07 -0.35 -0.77 0.12 -0.88 -0.27 -1.33 -0.06 0.36 0.20
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -2.39 -0.78 0.80 -0.38 0.95 -1.84 -0.58 0.15 0.77 0.91
1: 1.47 0.99 0.19 -0.20 -1.57 1.96 -2.13 -0.14 -0.77 0.30
2: 0.70 1.03 -1.39 -0.45 1.31 -0.67 0.65 -0.54 0.87 1.17
3: 1.56 1.05 -1.45 1.58 -0.95 0.34 2.50 0.62 0.28 -1.83
4: 0.03 -0.98 1.08 1.63 0.96 -0.51 1.96 0.74 -0.60 0.35
5: 1.66 0.05 0.21 1.56 -1.03 -0.02 0.76 0.11 -1.26 0.39
6: -0.47 0.27 -1.38 -1.14 0.82 -0.52 0.29 -1.35 0.59 0.19
7: -0.90 0.87 -1.98 -0.64 1.80 -2.76 0.72 -0.92 2.23 0.08
8: -0.38 0.05 -2.01 0.87 -0.71 -1.57 -0.55 -0.58 1.30 -1.31
9: 0.99 0.68 -1.04 -0.90 -0.34 -0.60 0.23 -0.54 0.62 -1.75
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -0.50 1.34 -1.14 0.35 -1.14 -1.93 1.79 0.39 -1.92 -0.56
1: -2.00 1.41 -0.61 0.36 -0.13 -0.59 -1.60 0.34 0.24 -0.08
2: -2.23 0.36 1.01 -0.03 -0.90 -0.17 -1.14 0.45 1.08 0.35
3: -1.81 0.39 0.39 0.44 -0.11 0.27 -1.29 -0.27 1.60 0.59
4: 0.87 -0.22 -0.49 0.47 -1.07 0.93 1.37 -0.60 -0.22 -0.59
5: 1.07 -0.67 -1.79 0.18 1.46 0.96 0.49 0.45 -1.65 -2.30
6: 1.37 0.30 -1.67 1.33 1.01 1.34 -0.26 0.56 -1.10 -1.08
7: -2.01 0.57 0.91 -1.13 -0.36 -0.74 -1.41 0.26 0.76 1.37
8: 2.20 -1.48 -1.54 0.16 1.58 1.84 0.87 1.13 -1.56 -2.67
9: -0.96 -1.22 0.17 -0.82 1.28 0.62 -0.39 0.41 0.80 -0.76
TEST F16_F32_ALIGNED_S m=4096 n=49 k=4096 batch=2 split_k=1 matmul 24.646ms avg_err=22.9484
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -15.86 -8.08 -19.98 24.29 -6.08 -4.27 12.34 -22.90 -5.15 -15.20
28: 1.49 -28.12 24.10 16.42 -23.41 2.57 -11.35 -25.49 24.47 14.60
29: -9.47 27.86 -31.85 31.43 24.26 22.52 -6.00 10.46 12.76 -14.86
30: 14.65 -29.48 -12.82 -2.42 15.14 27.62 42.05 0.46 -9.94 -24.12
31: -24.72 -17.18 26.80 2.86 -6.64 -1.65 -10.14 48.71 -16.12 8.32
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -15.86 -8.07 -19.98 24.29 -6.08 -4.26 12.34 -22.90 -5.15 -15.20
28: 1.48 -28.12 24.09 16.42 -23.41 2.58 -11.36 -25.49 24.47 14.60
29: -9.47 27.86 -31.86 31.43 24.26 22.52 -6.00 10.46 12.75 -14.86
30: 14.65 -29.48 -12.82 -2.42 15.15 27.63 42.05 0.46 -9.93 -24.12
31: -24.72 -17.18 26.81 2.86 -6.64 -1.66 -10.14 48.71 -16.12 8.32
32: 8.37 -31.08 21.96 28.08 14.05 31.34 -1.50 -13.47 -35.63 10.76
33: -28.31 -17.20 10.90 6.11 6.41 -10.32 16.29 -4.02 -37.45 5.18
34: 15.12 -5.18 -10.57 24.32 -2.16 29.43 -19.01 -0.89 19.11 0.65
35: 37.10 10.32 -5.28 -1.96 -15.80 -28.96 9.96 -29.57 9.87 0.01
36: 3.52 17.63 7.31 24.29 -11.74 20.88 16.90 5.84 -22.92 7.74
TEST F16_F32_ALIGNED_M m=4096 n=49 k=4096 batch=2 split_k=1 matmul 31.869ms avg_err=0.00627636
TEST F16_F32_ALIGNED_L m=4096 n=49 k=4096 batch=2 split_k=1 matmul 21.285ms avg_err=17.0809
m = 2048 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
2043: 24.24 7.01 -3.91 -2.49 -19.29 -5.93 -2.60 32.13 -28.43 -25.59
2044: -26.85 0.67 0.34 -14.17 2.05 -17.84 -6.85 37.56 3.21 -23.93
2045: 24.88 0.72 -1.53 -5.38 3.31 -8.56 1.00 1.79 2.49 -0.05
2046: 4.40 -10.12 30.08 8.17 -9.14 -4.80 -13.88 -31.99 -26.65 -4.39
2047: 14.59 6.12 -13.58 -10.69 -16.21 -9.24 -1.11 6.17 8.44 -26.16
2048: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2049: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2050: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2051: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2052: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
2043: 24.23 7.01 -3.90 -2.49 -19.29 -5.93 -2.59 32.12 -28.43 -25.59
2044: -26.85 0.67 0.34 -14.17 2.05 -17.84 -6.85 37.56 3.21 -23.92
2045: 24.89 0.72 -1.52 -5.40 3.31 -8.56 1.00 1.79 2.49 -0.05
2046: 4.40 -10.12 30.09 8.17 -9.14 -4.80 -13.88 -31.99 -26.65 -4.39
2047: 14.59 6.12 -13.58 -10.68 -16.21 -9.23 -1.10 6.17 8.44 -26.16
2048: -32.20 -35.34 15.10 14.70 -18.82 -13.51 14.03 -15.43 -10.58 -5.56
2049: 1.60 25.45 -5.34 0.16 -2.61 -13.94 26.69 -22.05 2.94 -8.59
2050: 37.01 54.73 21.80 -13.38 3.34 -21.85 -34.17 13.13 14.15 -3.19
2051: 3.76 -10.59 -5.91 7.33 17.38 -10.60 42.29 -25.83 -6.19 15.30
2052: 18.49 28.04 7.65 -1.15 8.85 -8.21 -7.31 -15.15 -10.87 -2.45
TEST F16_F32_ALIGNED_S m=4096 n=49 k=4096 batch=2 split_k=4 matmul 22.959ms avg_err=22.9662
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 1.06 -1.73 17.71 4.78 18.78 -12.29 -63.68 -18.43 46.52 13.61
28: -4.84 31.92 36.49 8.59 -16.75 -9.54 -2.85 -6.42 6.12 -11.53
29: 8.81 18.01 -20.94 -56.37 18.69 -20.83 -15.10 -32.61 -8.35 15.44
30: 1.15 -35.03 -16.40 -34.22 6.77 4.11 -12.62 -11.66 -13.46 -42.91
31: 53.27 16.45 40.93 -16.47 -27.49 -39.20 11.25 -12.24 -26.85 3.81
32: -0.32 -1.68 2.47 -2.21 -0.93 1.73 1.15 1.74 -0.21 0.44
33: 0.17 0.85 1.12 -0.01 -0.89 -1.01 -0.76 -0.38 0.22 2.64
34: 0.06 2.14 -1.29 -0.04 1.14 0.99 -2.39 0.09 -2.14 1.68
35: 2.45 -1.07 0.59 -3.07 1.36 -0.88 -1.74 -2.63 -2.30 1.32
36: -0.27 -1.98 -0.66 1.11 -3.04 -1.83 2.27 -3.49 -0.15 1.40
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 1.06 -1.73 17.71 4.77 18.79 -12.28 -63.68 -18.43 46.52 13.61
28: -4.84 31.92 36.50 8.60 -16.75 -9.54 -2.85 -6.42 6.12 -11.53
29: 8.81 18.01 -20.93 -56.37 18.69 -20.83 -15.10 -32.61 -8.36 15.44
30: 1.14 -35.04 -16.40 -34.23 6.77 4.10 -12.62 -11.67 -13.47 -42.91
31: 53.28 16.46 40.92 -16.46 -27.49 -39.20 11.25 -12.24 -26.85 3.82
32: 3.28 28.74 20.78 -5.77 -10.71 15.66 -20.64 17.52 -16.09 0.08
33: 9.51 -22.61 -0.79 34.60 -23.72 48.49 13.72 18.46 0.21 -7.96
34: 26.17 0.45 -15.01 -6.53 2.39 -29.96 -7.81 -15.21 -21.97 -4.30
35: -18.84 -5.38 -18.97 28.26 10.53 -42.75 -13.26 4.77 2.13 2.70
36: -9.68 -15.82 24.49 18.23 -34.29 -19.61 -28.58 -25.72 12.44 -7.57
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 15.52 -12.20 14.26 -3.18 19.07 14.73 -3.57 21.67 3.38 9.33
28: 9.05 4.50 6.63 -1.79 -7.65 -5.48 -3.63 -5.70 -9.37 -11.56
29: 3.32 0.16 -11.55 -12.31 -1.79 -13.40 -2.89 -11.62 12.47 -4.28
30: -0.07 -14.54 -3.20 -5.92 -12.47 7.97 -7.09 1.68 10.14 7.94
31: 16.62 -9.62 24.76 -11.82 -6.85 -26.90 12.51 -7.30 -5.15 10.57
32: 0.05 0.25 1.17 -0.63 0.18 0.41 1.96 0.55 0.48 -0.08
33: 1.16 0.54 2.01 0.27 0.76 -1.74 -1.59 0.41 0.61 1.62
34: 1.51 0.97 -0.42 0.49 1.16 -0.26 -1.54 0.04 -0.42 0.31
35: 0.91 -0.60 0.40 -0.69 1.61 -1.68 -0.69 0.52 -1.02 0.34
36: -0.11 -0.10 -2.84 0.83 1.17 -1.13 0.88 -2.84 -1.20 1.86
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 1.31 -0.41 1.67 1.03 -4.38 1.60 -17.62 -0.37 13.26 15.47
28: 8.39 -2.07 14.66 10.73 -16.61 6.19 4.18 14.61 14.16 3.79
29: 3.58 6.89 -4.09 -20.83 -9.33 8.89 2.33 -2.76 -17.63 2.19
30: 10.48 -10.72 -16.29 -0.49 8.38 -2.66 -5.61 4.61 4.60 -27.82
31: 16.56 7.50 2.01 -4.15 12.84 -12.01 -10.51 -3.24 -6.23 -3.16
32: -0.36 -0.20 0.88 0.37 -0.86 1.08 -0.22 0.06 -1.43 -0.41
33: -0.22 -0.67 -0.23 0.52 -0.56 -0.24 0.02 -0.99 -1.08 0.20
34: 0.36 -0.12 -0.97 -0.47 0.82 0.83 0.03 -1.10 0.03 0.33
35: 0.31 0.44 0.41 -1.07 -1.61 1.00 -0.36 -1.52 -0.88 -0.60
36: 0.64 -0.34 0.77 0.28 -2.23 -1.01 -0.18 -0.26 -0.34 -0.47
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -1.32 3.05 -1.84 -11.00 -6.32 -5.46 -19.58 -22.44 8.30 2.85
28: -23.41 13.67 -5.43 -3.93 -7.32 -3.93 -12.75 -2.48 -9.44 7.91
29: -2.59 6.62 3.03 -13.25 0.28 -6.18 -6.47 -7.41 -1.75 11.13
30: -12.61 -11.03 0.98 3.45 2.86 11.46 -14.20 -23.63 -4.52 -0.72
31: 25.00 8.91 4.35 9.46 -14.98 -4.83 3.45 6.06 -1.31 -6.51
32: 0.24 0.24 0.93 -1.48 0.31 0.21 -0.20 0.25 0.70 -0.00
33: -0.54 0.74 0.03 0.01 -0.12 0.92 0.34 0.42 0.20 0.47
34: -1.66 0.83 -0.44 -0.60 0.04 0.86 -1.10 1.07 -1.24 0.46
35: 1.57 -0.64 0.25 -1.63 0.64 -0.66 0.03 -0.93 -0.16 0.26
36: -0.98 -0.37 -0.15 -0.61 -1.19 0.59 0.06 0.61 -0.27 -0.17
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -14.45 7.83 3.61 17.92 10.41 -23.16 -22.91 -17.30 21.57 -14.04
28: 1.13 15.82 20.63 3.58 14.82 -6.32 9.36 -12.85 10.77 -11.66
29: 4.50 4.35 -8.33 -9.98 29.53 -10.14 -8.07 -10.83 -1.45 6.40
30: 3.34 1.25 2.12 -31.26 8.01 -12.66 14.29 5.69 -23.69 -22.31
31: -4.91 9.66 9.80 -9.95 -18.49 4.55 5.80 -7.76 -14.16 2.91
32: -0.25 -1.97 -0.52 -0.46 -0.56 0.03 -0.39 0.88 0.04 0.94
33: -0.23 0.24 -0.68 -0.82 -0.97 0.05 0.47 -0.22 0.49 0.35
34: -0.14 0.47 0.54 0.54 -0.88 -0.44 0.22 0.09 -0.50 0.58
35: -0.35 -0.28 -0.47 0.32 0.71 0.46 -0.72 -0.69 -0.24 1.31
36: 0.17 -1.18 1.56 0.61 -0.78 -0.28 1.51 -1.00 1.66 0.18
TEST F16_F32_ALIGNED_M m=4096 n=49 k=4096 batch=2 split_k=4 matmul 32.008ms avg_err=0.00627715
TEST F16_F32_ALIGNED_L m=4096 n=49 k=4096 batch=2 split_k=4 matmul 21.622ms avg_err=33.9866
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 6.78 36.01 -19.85 -8.02 -1.57 -51.40 -18.65 -1.11 -27.53 -11.48
1: -18.80 -26.66 -2.75 -38.64 24.32 -2.47 3.16 17.55 -28.68 -24.32
2: -8.94 -10.44 -30.13 13.51 19.88 -4.16 33.97 -24.77 -18.61 11.96
3: 3.75 11.25 -5.55 15.65 -2.85 2.16 32.97 -23.98 5.09 -10.22
4: -16.97 -1.42 41.46 2.60 -8.71 21.19 11.23 24.73 8.25 -48.99
5: -5.56 -6.09 30.24 -1.11 31.33 -4.31 -18.99 -39.90 32.23 -21.34
6: -19.47 -11.66 14.58 -25.77 1.00 -5.04 -7.75 3.17 42.04 24.88
7: 0.97 60.23 -16.21 -31.95 14.06 -3.54 27.96 3.90 -13.60 12.24
8: 21.21 -29.11 49.82 -0.22 -6.67 28.84 -7.05 7.82 -12.56 -26.00
9: 39.59 26.23 18.69 4.54 1.77 9.12 13.96 11.33 8.91 1.30
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 2.41 31.79 5.94 19.89 -20.43 -54.32 -1.39 4.16 19.62 -28.96
1: -2.54 -25.49 -9.87 -20.48 1.56 7.42 8.87 8.60 5.97 -12.89
2: 49.87 -21.01 7.39 -7.91 39.02 21.53 13.90 -42.20 19.01 -6.02
3: 13.85 24.92 -11.34 -14.72 -14.69 14.39 23.38 -40.22 23.17 -0.19
4: -7.96 -23.14 41.33 16.80 16.81 3.78 14.13 49.02 4.79 -17.96
5: -21.57 -25.02 6.21 2.58 35.08 -7.41 3.83 -27.10 27.42 -16.88
6: 2.34 -7.77 18.71 -30.59 -44.47 0.95 29.39 13.14 45.51 -1.05
7: 9.99 19.79 -6.17 33.92 13.16 4.19 8.93 17.63 -15.72 19.16
8: -3.48 -40.38 33.56 -4.41 28.24 2.22 -39.76 2.80 4.64 -10.68
9: 15.89 13.28 15.49 -4.95 -24.05 23.40 13.98 -1.57 3.05 29.25
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 8.47 14.52 -14.65 -4.88 5.42 -24.33 -1.76 -12.42 2.91 -23.12
1: -10.22 -10.09 4.90 -7.01 0.94 0.01 14.89 10.63 -11.34 -6.47
2: -3.17 -10.14 -10.70 0.73 23.04 10.90 14.30 -8.11 -0.71 -8.13
3: 3.77 -0.02 1.01 -0.04 -12.52 -0.41 18.93 -12.27 -3.16 -3.40
4: -10.20 1.41 11.04 0.85 -2.06 -3.34 8.19 25.36 3.03 -5.77
5: 8.02 -7.64 6.37 16.45 11.33 3.95 2.08 -18.09 14.87 -18.30
6: 6.41 -8.96 -1.81 -3.12 -0.87 -3.71 -4.67 -2.84 12.09 -2.91
7: 15.48 25.20 -5.79 9.42 3.87 5.41 5.55 7.30 -13.79 5.99
8: 9.25 -3.58 17.99 -17.56 1.83 -3.29 -15.08 5.50 -10.26 -3.73
9: 1.12 4.49 7.17 -3.99 2.70 7.12 9.48 3.64 -5.13 1.35
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 11.24 15.28 -5.07 -1.45 -7.15 -9.52 3.96 0.90 8.98 -0.09
1: 15.75 -25.81 -2.68 -21.34 3.05 -6.68 -9.16 -0.91 -8.66 -3.05
2: 17.85 5.80 1.64 -0.19 12.00 3.85 10.12 -20.44 -2.65 1.60
3: 19.46 13.71 1.12 5.94 13.16 4.06 14.77 -3.31 21.72 -1.64
4: 12.56 -13.77 22.11 13.47 10.61 16.28 1.76 -4.22 3.76 -18.09
5: -18.67 8.35 17.93 -6.33 8.01 -12.72 -8.11 2.16 -2.42 0.89
6: -21.07 1.55 2.98 3.49 -5.66 5.30 4.74 2.83 24.44 1.06
7: -7.72 9.93 -8.98 -1.87 13.98 -5.34 11.95 -1.93 4.75 2.10
8: -11.04 -11.35 20.22 3.29 8.06 11.56 -17.95 -0.97 6.89 -2.25
9: 14.69 18.29 0.37 15.56 -8.83 -8.61 -3.33 6.88 5.03 11.88
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -1.03 2.07 -3.76 10.11 3.23 -7.44 -12.52 16.29 -40.07 -6.57
1: 1.01 -2.68 0.47 -6.66 11.54 -2.31 -4.90 7.98 3.88 -4.57
2: -11.45 -3.35 -11.36 15.37 -22.72 -4.94 -3.48 12.67 -2.74 13.15
3: -2.34 9.27 -2.24 5.07 -5.20 -11.59 11.18 17.41 -16.34 -15.33
4: -12.64 9.41 -12.86 -13.55 -5.82 6.95 2.37 1.88 1.66 -4.68
5: -1.40 -16.27 10.03 -2.53 16.86 -10.08 0.86 -13.69 18.81 3.27
6: 7.06 -17.58 13.57 -6.53 -8.22 7.55 -4.36 11.15 -3.97 16.05
7: 5.91 13.97 -1.33 -13.95 7.48 4.90 -3.57 1.14 -8.07 -3.70
8: -10.13 -9.79 -1.33 6.48 -2.43 7.33 19.13 5.25 -10.20 0.43
9: 12.04 11.40 -4.15 4.94 15.59 6.68 -8.95 -2.32 13.61 -15.32
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -11.90 4.14 3.62 -11.80 -3.08 -10.10 -8.32 -5.87 0.65 18.30
1: -25.34 11.92 -5.44 -3.64 8.80 6.50 2.34 -0.15 -12.56 -10.23
2: -12.16 -2.75 -9.70 -2.39 7.55 -13.97 13.02 -8.88 -12.51 5.35
3: -17.14 -11.71 -5.44 4.68 1.72 10.10 -11.92 -25.80 2.87 10.14
4: -6.68 1.53 21.18 1.83 -11.43 1.30 -1.10 1.71 -0.19 -20.44
5: 6.50 9.46 -4.09 -8.71 -4.87 14.54 -13.82 -10.28 0.96 -7.20
6: -11.87 13.33 -0.16 -19.61 15.75 -14.17 -3.46 -7.97 9.48 10.67
7: -12.71 11.14 -0.12 -25.55 -11.27 -8.52 14.03 -2.62 3.51 7.85
8: 33.14 -4.38 12.94 7.57 -14.14 13.24 6.84 -1.95 1.01 -20.44
9: 11.74 -7.95 15.31 -11.97 -7.69 3.92 16.76 3.13 -4.60 3.39
TEST F16_F32_ALIGNED_S m=11008 n=49 k=4096 batch=2 split_k=1 matmul 48.523ms avg_err=22.9065
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -6.97 23.57 34.17 -5.70 -59.94 31.64 -15.80 -21.78 -9.87 -11.23
28: 8.77 12.27 33.33 48.85 -26.12 -33.75 -15.84 -12.06 0.74 3.10
29: -5.87 25.60 8.36 7.34 -24.66 -0.60 15.87 21.94 -23.15 42.72
30: -10.86 -27.74 34.28 -36.84 7.11 -41.78 -20.74 42.54 -12.20 -15.92
31: 24.87 28.71 37.28 19.58 15.88 -18.33 5.16 -14.05 14.91 30.86
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -6.97 23.57 34.18 -5.69 -59.94 31.63 -15.80 -21.77 -9.87 -11.22
28: 8.77 12.27 33.33 48.85 -26.12 -33.75 -15.84 -12.06 0.74 3.10
29: -5.87 25.60 8.36 7.35 -24.65 -0.60 15.87 21.95 -23.15 42.71
30: -10.86 -27.73 34.29 -36.84 7.10 -41.79 -20.74 42.55 -12.20 -15.93
31: 24.87 28.71 37.27 19.57 15.88 -18.33 5.16 -14.04 14.91 30.86
32: -48.53 3.20 17.27 9.95 -20.65 10.91 18.12 -1.50 39.54 9.81
33: -6.94 -28.06 4.38 -28.11 27.88 14.67 50.72 -15.81 -21.60 25.74
34: -11.73 -6.83 -14.10 -21.80 16.33 18.10 25.07 -10.21 -34.27 33.69
35: 4.95 -14.74 17.21 -1.15 -24.04 -56.99 -13.44 -6.16 16.51 -51.11
36: -19.60 13.22 -31.29 14.89 29.57 10.79 4.69 -11.88 5.55 -0.98
TEST F16_F32_ALIGNED_M m=11008 n=49 k=4096 batch=2 split_k=1 matmul 67.658ms avg_err=0.00628326
TEST F16_F32_ALIGNED_L m=11008 n=49 k=4096 batch=2 split_k=1 matmul 38.778ms avg_err=17.0047
m = 5504 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
5499: -28.74 13.48 25.69 -38.89 4.24 -13.63 -2.66 31.46 24.72 54.90
5500: 22.48 -14.57 10.21 -13.54 -4.48 -23.89 3.35 2.36 -23.64 -7.17
5501: -1.48 33.10 -3.13 -1.07 9.43 21.45 -12.29 -3.58 5.77 -7.23
5502: 26.58 -4.34 19.49 -18.76 -10.54 -45.96 -1.08 11.66 5.45 8.79
5503: -0.54 7.64 26.00 5.04 -26.67 3.28 18.59 34.76 -17.51 -18.37
5504: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5505: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5506: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5507: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5508: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
5499: -28.74 13.48 25.69 -38.89 4.24 -13.64 -2.66 31.47 24.73 54.89
5500: 22.48 -14.57 10.21 -13.54 -4.48 -23.89 3.36 2.36 -23.64 -7.18
5501: -1.48 33.10 -3.13 -1.07 9.43 21.45 -12.29 -3.58 5.77 -7.23
5502: 26.58 -4.34 19.49 -18.77 -10.54 -45.96 -1.08 11.66 5.45 8.79
5503: -0.54 7.64 26.00 5.04 -26.67 3.28 18.58 34.77 -17.52 -18.37
5504: -35.02 -8.64 -26.63 17.01 17.60 -4.00 -7.22 -8.82 -25.98 13.06
5505: 30.74 39.37 -12.88 -12.77 -18.55 61.15 1.99 35.68 -1.87 2.36
5506: -9.53 59.69 11.19 10.93 9.63 -29.83 -38.12 -17.67 -27.01 24.87
5507: 5.43 -34.52 -3.65 -2.35 23.92 -26.04 -18.76 9.18 -3.92 20.78
5508: -12.46 -15.07 16.10 -7.35 1.10 -21.78 9.31 15.11 -57.40 41.26
TEST F16_F32_ALIGNED_S m=11008 n=49 k=4096 batch=2 split_k=4 matmul 44.344ms avg_err=22.9046
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 13.40 30.13 -25.46 40.55 15.03 19.12 -4.09 -50.50 -12.07 -17.17
28: -4.32 -0.29 -11.12 -1.76 -5.64 -16.13 3.62 13.58 12.14 24.20
29: 22.10 -17.65 -6.46 21.09 12.41 5.17 33.50 1.37 4.34 -30.50
30: 19.60 7.65 -35.53 -7.02 -15.61 9.67 -29.02 -12.44 38.92 2.65
31: -30.22 10.93 6.14 19.76 8.92 4.57 0.48 39.42 -4.69 3.92
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 13.39 30.13 -25.46 40.55 15.03 19.13 -4.09 -50.49 -12.07 -17.17
28: -4.32 -0.29 -11.12 -1.77 -5.65 -16.13 3.62 13.58 12.14 24.19
29: 22.10 -17.65 -6.46 21.09 12.41 5.17 33.50 1.37 4.33 -30.50
30: 19.60 7.65 -35.53 -7.02 -15.61 9.67 -29.02 -12.44 38.93 2.65
31: -30.23 10.93 6.15 19.76 8.92 4.58 0.48 39.42 -4.68 3.93
32: -8.98 22.77 -7.91 3.64 0.06 -5.98 31.22 -4.12 4.75 2.64
33: 9.27 12.35 -31.86 48.53 34.15 2.71 22.69 -27.39 32.11 29.21
34: -18.83 -19.89 -4.45 23.02 6.42 -0.70 -0.11 -11.66 36.15 -14.52
35: -27.49 -10.01 27.31 31.52 -9.55 -12.40 4.16 14.29 -24.13 -0.56
36: -20.35 3.62 -37.65 21.02 -28.35 -10.90 19.35 17.21 -2.81 3.61
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 7.54 3.01 0.99 15.90 -2.43 17.78 0.07 -15.89 10.25 -1.11
28: -7.13 18.83 9.95 16.75 10.24 -5.41 1.53 -6.98 3.40 16.87
29: 2.13 -18.43 -7.89 17.19 -5.03 7.40 9.32 3.06 5.97 -18.02
30: -7.36 2.52 -20.51 6.88 -5.93 11.57 -0.60 -6.99 15.50 -2.81
31: -8.36 8.47 2.29 6.02 10.85 -3.10 6.56 -1.81 8.53 9.92
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 7.42 12.78 -2.60 3.31 -2.18 -1.61 -9.20 -10.91 -11.80 -0.45
28: 17.00 -21.26 -12.39 -16.02 -9.10 -7.35 7.78 15.39 3.58 -11.67
29: 16.86 -0.61 -3.95 -1.66 16.02 -4.51 20.36 5.10 6.42 -19.72
30: -4.11 -6.18 -18.85 -9.16 -7.91 8.95 -10.99 -11.33 -3.96 17.04
31: -9.98 -0.52 -0.03 5.91 -10.12 7.70 9.43 -0.22 -0.12 -23.82
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 11.30 11.01 -16.56 22.42 -8.07 3.84 10.36 -1.36 -4.89 -16.86
28: -12.05 1.08 -2.03 1.88 -4.87 0.63 -1.82 5.28 -0.94 15.49
29: -6.97 16.52 2.19 6.52 3.83 -3.48 6.18 -5.61 5.62 16.60
30: 11.59 7.75 18.50 -9.38 -9.38 -2.39 -2.99 12.69 11.23 -6.47
31: -20.28 -9.56 0.26 6.26 -3.83 1.38 2.94 8.01 -11.16 7.06
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -12.86 3.32 -7.29 -1.07 27.71 -0.90 -5.31 -22.32 -5.64 1.24
28: -2.14 1.05 -6.66 -4.37 -1.91 -4.01 -3.87 -0.10 6.11 3.50
29: 10.08 -15.13 3.20 -0.96 -2.41 5.76 -2.36 -1.18 -13.67 -9.36
30: 19.48 3.56 -14.67 4.64 7.61 -8.46 -14.44 -6.81 16.15 -5.11
31: 8.40 12.54 3.62 1.57 12.02 -1.40 -18.44 33.44 -1.94 10.77
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_ALIGNED_M m=11008 n=49 k=4096 batch=2 split_k=4 matmul 64.606ms avg_err=0.00627888
TEST F16_F32_ALIGNED_L m=11008 n=49 k=4096 batch=2 split_k=4 matmul 37.37ms avg_err=34.0174
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -36.46 -5.61 -18.44 -32.86 -2.16 -9.88 -5.16 15.77 49.02 9.90
1: 1.25 63.48 -42.63 32.20 8.97 8.31 -4.68 11.48 15.25 0.93
2: 1.60 -2.95 -20.44 -11.62 -35.24 -8.61 16.44 12.18 -38.31 34.02
3: -18.54 31.70 11.21 -22.40 -39.66 -16.84 -15.09 11.17 5.99 -7.30
4: -7.60 -25.28 15.61 20.82 -6.79 25.78 22.68 20.47 21.84 1.14
5: 3.61 -14.72 33.21 19.48 -8.11 -14.19 -4.16 15.78 -23.40 17.31
6: 12.10 -12.55 23.77 2.82 -12.80 18.40 -37.12 -31.91 -8.71 -1.82
7: -48.15 1.25 16.49 -2.55 -7.99 21.64 -15.57 -2.19 -28.11 23.98
8: 38.39 -25.77 -10.93 -20.21 36.49 25.12 -2.47 -3.74 16.05 8.00
9: -21.62 -7.51 28.61 10.60 10.50 -1.72 -41.72 -22.51 12.25 -1.43
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 17.44 10.79 6.96 -39.15 42.65 -10.25 40.92 16.93 14.19 -4.14
1: -3.18 40.94 -21.05 15.89 -9.02 -3.11 -37.40 -7.58 -7.12 13.28
2: 33.95 -16.16 -11.58 -7.64 -43.23 17.57 10.02 7.38 -11.49 21.18
3: -25.73 49.99 44.82 -12.96 -35.64 10.64 -20.52 5.17 -5.61 -3.00
4: -25.17 -39.82 18.87 -10.38 -2.84 29.02 29.94 30.10 15.70 14.21
5: 35.71 -10.14 19.89 -5.71 18.20 -6.53 14.07 32.60 -27.41 15.46
6: -3.39 -3.05 -3.82 14.22 -18.50 -11.32 -9.12 -47.52 -12.75 -4.58
7: -6.01 42.62 69.74 -11.13 -24.74 0.58 -12.01 -36.04 -23.47 3.71
8: 21.96 -11.62 -7.67 23.90 27.82 4.16 10.26 9.60 7.27 -3.34
9: 20.06 -40.15 17.31 -5.66 -13.15 30.36 -49.50 -15.56 2.51 -3.86
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -2.80 0.51 -11.29 -10.48 0.28 10.31 2.04 0.76 21.02 7.74
1: 13.59 22.18 -2.09 17.90 -2.26 -10.68 -18.25 4.06 6.78 2.16
2: 18.12 -2.21 7.10 -0.53 -16.28 -4.32 -2.37 12.42 -7.97 6.13
3: -3.95 14.93 13.66 -0.58 -33.60 -1.26 18.06 15.38 -5.74 -12.14
4: 0.68 0.57 0.14 10.57 -16.03 7.04 5.45 1.62 11.75 7.31
5: 14.33 11.33 3.01 -1.95 7.48 -1.88 3.87 12.74 -13.32 0.37
6: -4.68 -9.55 7.56 -10.68 -8.44 10.94 -13.62 -36.35 1.90 14.17
7: -11.60 1.81 11.77 -16.38 0.89 11.85 -6.53 -29.32 3.29 -5.67
8: 14.52 -15.86 -9.77 -15.69 2.81 13.34 -12.33 3.48 5.83 -3.38
9: -4.84 -10.63 5.92 7.72 12.68 3.59 -16.17 0.02 -4.85 7.01
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 3.64 6.83 -0.78 -9.04 2.44 -9.43 2.01 5.38 9.75 4.90
1: -10.33 5.67 -10.04 -7.31 5.59 13.20 3.40 10.54 4.63 1.99
2: 3.71 -19.39 -17.22 -1.50 -9.49 1.42 20.67 -5.08 -13.38 3.77
3: 0.63 19.35 6.97 -11.45 0.67 -28.60 -24.32 4.83 -2.55 -4.08
4: -11.46 -24.28 5.74 -11.35 6.46 14.10 11.57 19.58 -1.38 11.26
5: -0.50 -11.03 16.86 5.20 9.11 -10.24 -0.98 12.98 -5.54 21.37
6: 1.40 -4.37 6.94 0.68 -10.14 -13.18 -13.48 14.40 -16.58 -1.76
7: -4.92 23.58 14.55 1.93 -0.73 -3.82 -0.12 7.27 -9.42 13.82
8: 6.78 -1.09 1.27 20.27 4.23 -5.26 23.12 11.71 -4.69 17.82
9: 0.68 3.25 5.41 -2.45 2.12 3.97 -6.97 -5.87 1.33 2.42
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -24.09 -7.28 0.33 -17.30 -1.40 -16.02 -1.14 5.71 -4.13 -2.58
1: 2.28 17.69 -5.15 10.26 0.75 5.18 4.80 2.09 1.32 -1.41
2: -15.05 6.07 8.70 -11.22 2.75 -3.48 1.77 1.47 -8.27 14.84
3: -4.59 -6.19 9.49 -7.95 -2.09 -1.52 -2.02 -2.43 -11.27 20.72
4: 9.26 -0.79 6.61 12.74 8.07 -1.21 15.83 -1.03 -10.79 2.26
5: -1.29 -10.03 -1.95 12.12 -18.75 3.42 -7.62 -8.86 -4.18 0.09
6: 14.95 -0.17 -2.52 -5.12 -2.98 1.66 4.94 -10.63 25.35 -4.39
7: 2.55 -17.05 -11.14 -2.55 -16.35 6.03 1.92 12.09 -13.09 17.41
8: -2.18 10.41 15.18 3.69 18.30 16.77 -10.56 -21.82 13.61 1.18
9: -8.38 16.57 1.84 3.61 3.09 -5.41 -16.20 -6.62 13.32 2.52
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -13.22 -5.67 -6.71 3.96 -3.47 5.26 -8.07 3.92 22.38 -0.16
1: -4.29 17.94 -25.35 11.34 4.88 0.61 5.37 -5.21 2.53 -1.82
2: -5.18 12.58 -19.02 1.62 -12.22 -2.24 -3.63 3.37 -8.70 9.28
3: -10.64 3.60 -18.91 -2.42 -4.64 14.54 -6.80 -6.61 25.55 -11.80
4: -6.07 -0.78 3.12 8.86 -5.28 5.85 -10.17 0.29 22.27 -19.70
5: -8.92 -5.00 15.29 4.11 -5.95 -5.49 0.56 -1.08 -0.36 -4.51
6: 0.42 1.55 11.79 17.93 8.76 18.99 -14.96 0.67 -19.38 -9.83
7: -34.18 -7.08 1.31 14.46 8.21 7.57 -10.84 7.76 -8.89 -1.58
8: 19.26 -19.22 -17.61 -28.47 11.15 0.27 -2.71 2.89 1.29 -7.62
9: -9.07 -16.70 15.44 1.73 -7.39 -3.87 -2.38 -10.04 2.46 -13.37
TEST F16_F32_ALIGNED_S m=4096 n=49 k=11008 batch=2 split_k=1 matmul 68.686ms avg_err=37.5542
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 59.64 13.43 -109.58 -4.61 -37.52 -4.71 21.07 66.29 27.67 1.50
28: 54.14 38.66 -69.00 3.80 4.21 -19.36 42.58 -20.69 13.25 -26.71
29: 50.83 -27.87 37.31 24.15 -18.82 20.41 -28.69 -8.85 70.57 7.80
30: -6.81 15.09 27.78 70.53 -12.48 36.99 -31.17 66.18 49.39 50.64
31: -41.14 -25.51 -12.10 23.69 12.23 67.65 -1.75 36.14 -48.39 9.91
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 59.64 13.44 -109.58 -4.60 -37.51 -4.71 21.08 66.30 27.68 1.49
28: 54.14 38.66 -68.99 3.81 4.21 -19.36 42.57 -20.69 13.25 -26.71
29: 50.83 -27.87 37.32 24.14 -18.81 20.41 -28.70 -8.84 70.58 7.80
30: -6.80 15.08 27.78 70.53 -12.48 36.99 -31.16 66.17 49.39 50.64
31: -41.14 -25.50 -12.11 23.69 12.23 67.65 -1.75 36.14 -48.40 9.91
32: -25.19 -2.75 44.06 -1.59 -44.88 20.81 -34.38 2.38 6.61 -26.04
33: -59.27 10.91 41.51 -0.27 -31.80 -34.70 24.74 9.99 44.65 -4.98
34: -12.37 59.14 -12.08 2.94 9.64 -36.62 18.46 3.82 -70.13 -27.05
35: 37.43 17.63 -34.43 -27.03 9.20 8.67 -17.37 -7.51 -11.67 28.83
36: 6.24 -26.15 -8.96 -0.86 -53.50 4.83 46.05 18.87 -4.23 -31.94
TEST F16_F32_ALIGNED_M m=4096 n=49 k=11008 batch=2 split_k=1 matmul 70.88ms avg_err=0.0103226
TEST F16_F32_ALIGNED_L m=4096 n=49 k=11008 batch=2 split_k=1 matmul 40.819ms avg_err=27.914
m = 2048 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
2043: 42.52 -33.29 0.60 44.05 75.00 -38.28 -56.93 8.60 28.41 -2.51
2044: 79.76 -46.38 0.26 -23.88 -26.99 53.38 -2.38 -78.00 2.56 -53.12
2045: 42.04 104.44 -55.47 -65.08 29.67 -26.42 -27.54 -42.05 -14.68 -17.17
2046: 13.29 21.28 -23.77 0.10 -20.88 52.69 13.92 -55.04 8.03 -39.57
2047: -22.03 -35.15 18.50 9.28 -66.42 -25.43 -29.23 37.01 -34.54 -28.65
2048: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2049: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2050: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2051: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2052: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
2043: 42.53 -33.28 0.59 44.04 74.99 -38.29 -56.93 8.60 28.41 -2.51
2044: 79.75 -46.38 0.26 -23.89 -26.98 53.37 -2.38 -78.00 2.56 -53.12
2045: 42.04 104.43 -55.47 -65.07 29.67 -26.42 -27.54 -42.05 -14.68 -17.17
2046: 13.29 21.28 -23.77 0.09 -20.87 52.70 13.91 -55.04 8.03 -39.58
2047: -22.03 -35.15 18.51 9.29 -66.41 -25.43 -29.23 37.01 -34.53 -28.65
2048: -81.94 46.43 -14.47 11.70 -4.62 63.19 -80.34 5.14 -1.38 -37.03
2049: -36.96 59.00 42.98 29.38 -38.09 -18.54 36.63 18.01 24.57 9.87
2050: -35.20 -21.96 7.64 26.66 4.25 -9.62 0.95 58.98 -2.18 13.56
2051: 5.52 17.63 31.05 -30.66 13.87 -49.55 32.14 32.66 -22.95 -57.60
2052: -25.37 -31.86 -28.68 -23.00 81.37 4.95 47.50 -56.24 -49.99 48.75
TEST F16_F32_ALIGNED_S m=4096 n=49 k=11008 batch=2 split_k=4 matmul 60.87ms avg_err=43.9891
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -62.95 -8.03 -23.60 -42.53 -40.75 28.47 27.48 0.28 -33.46 9.53
28: 66.94 42.09 25.90 5.39 -22.93 -33.27 82.19 -9.79 21.37 -16.96
29: 55.66 -14.57 60.25 -60.01 26.83 -10.06 8.21 -28.61 -55.75 -10.54
30: -32.25 13.67 51.63 36.44 18.27 25.28 18.88 38.20 -9.88 -76.98
31: 15.44 2.10 -24.33 -17.57 -24.87 -2.54 36.07 -5.64 25.90 -11.54
32: -10.78 -2.67 -2.11 -10.15 -14.37 -1.64 -18.03 -15.07 12.45 -4.44
33: -13.60 -23.07 25.70 8.12 8.31 15.95 -12.14 -9.11 3.35 15.75
34: 9.70 -12.87 -2.57 -18.90 25.21 -32.34 -13.16 -62.69 -12.96 -15.84
35: -17.51 -28.26 18.63 21.56 6.13 7.89 -22.93 -24.83 -26.83 -13.47
36: 32.10 -5.31 11.16 -27.94 -10.41 -6.46 29.32 8.39 8.30 22.43
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -62.95 -8.02 -23.61 -42.53 -40.75 28.49 27.48 0.29 -33.47 9.53
28: 66.94 42.09 25.89 5.39 -22.92 -33.27 82.20 -9.78 21.36 -16.96
29: 55.66 -14.57 60.26 -60.01 26.84 -10.06 8.21 -28.62 -55.75 -10.53
30: -32.24 13.67 51.63 36.45 18.28 25.28 18.88 38.20 -9.87 -77.00
31: 15.45 2.10 -24.33 -17.58 -24.86 -2.55 36.07 -5.63 25.91 -11.53
32: 9.76 3.62 33.67 61.95 -56.01 -15.18 98.85 -13.30 -45.92 -107.14
33: -36.71 -63.51 45.71 12.95 23.52 7.75 -59.59 16.92 -21.45 62.20
34: -35.43 25.62 -0.00 -48.71 -14.13 -11.36 -26.03 12.05 24.68 11.04
35: 3.57 70.37 47.21 6.35 -25.28 -10.81 34.73 -37.68 -42.37 52.12
36: 5.31 -52.54 5.40 -23.06 -18.78 -7.47 30.33 19.51 27.83 -96.97
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -39.46 19.84 -9.29 4.71 -18.77 4.64 9.39 -4.45 0.53 -3.23
28: 3.61 5.51 2.87 -10.99 -11.47 -7.60 1.66 3.99 -16.33 0.52
29: -9.68 -9.63 12.66 -35.94 10.05 1.44 -5.36 -7.60 -34.36 -14.71
30: -19.30 15.23 -0.77 33.20 7.13 24.75 -6.73 8.11 9.54 -6.18
31: 0.68 28.89 -35.10 -6.30 -9.41 20.01 -1.97 6.93 -8.48 -2.30
32: -0.89 4.03 3.96 3.53 -13.60 -12.88 -6.04 -3.31 6.48 4.00
33: 17.43 -18.67 16.10 -9.05 22.05 5.83 -12.76 -9.60 17.28 3.33
34: -1.31 -1.94 -7.52 -0.68 26.41 -11.96 -7.82 -27.74 -18.52 -6.13
35: -2.12 9.61 18.93 18.36 -6.06 -7.45 -1.18 -7.21 -16.70 10.06
36: 11.96 -11.84 -0.85 -10.10 -3.94 1.49 27.16 16.19 -0.80 -5.24
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -36.58 -1.21 -11.41 -10.41 -3.67 -14.18 4.18 -1.79 -6.95 -7.46
28: 35.95 22.84 18.19 -4.38 -0.94 -8.49 5.46 -2.44 40.94 -20.50
29: 15.67 -6.41 21.83 8.30 -9.41 -12.76 24.08 -18.73 -4.78 2.69
30: -2.69 1.30 27.01 22.33 -10.67 -19.26 20.07 5.10 -11.10 -27.54
31: 22.77 -20.13 2.57 3.48 10.59 -26.38 10.72 -19.45 32.71 -8.28
32: 9.10 -6.49 -4.78 6.23 -7.29 -19.79 3.03 11.16 6.66 -1.43
33: -17.30 -8.35 0.69 -0.89 -23.97 0.05 -19.12 -16.92 -1.68 -5.28
34: -5.37 0.16 9.23 -2.84 -11.14 -11.76 4.06 -0.72 11.32 -5.46
35: -10.39 -18.43 -7.02 6.24 3.36 6.27 1.36 -12.13 -9.17 -8.02
36: 5.50 1.10 11.30 -16.66 8.58 11.62 -15.03 -4.36 10.17 -9.59
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 3.17 -8.03 1.58 -15.85 -26.45 37.04 -6.46 -9.39 -19.57 3.26
28: 13.08 37.88 -22.44 -0.70 -8.85 -10.11 42.91 -4.84 8.99 19.33
29: 37.02 3.19 26.88 -15.87 16.83 18.62 -22.39 10.62 -26.12 -13.79
30: -7.30 -9.50 5.42 1.36 15.82 1.09 -10.02 12.65 -0.06 -39.99
31: -1.21 6.69 4.53 -2.82 -16.67 3.59 28.97 10.39 -8.47 -2.81
32: -16.79 -4.52 1.48 -4.92 6.68 0.96 7.55 -19.86 11.13 4.39
33: -13.56 7.06 -0.49 5.46 -0.19 -0.55 4.30 1.74 -15.03 10.37
34: 1.21 0.71 9.20 -3.65 -4.08 4.62 -5.94 -14.11 -2.98 5.20
35: 1.62 -16.77 7.58 -12.64 1.51 -12.26 -14.36 1.45 1.84 -7.45
36: 1.27 -0.41 5.93 7.17 -13.59 -9.55 14.96 -8.72 3.37 10.16
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 9.91 -18.63 -4.49 -20.97 8.15 0.98 20.37 15.91 -7.46 16.97
28: 14.29 -24.14 27.28 21.46 -1.66 -7.08 32.16 -6.51 -12.23 -16.31
29: 12.65 -1.71 -1.11 -16.49 9.36 -17.36 11.89 -12.91 9.51 15.27
30: -2.95 6.65 19.97 -20.44 5.99 18.70 15.57 12.35 -8.27 -3.28
31: -6.80 -13.35 3.66 -11.93 -9.38 0.24 -1.66 -3.52 10.14 1.85
32: -2.20 4.31 -2.76 -14.98 -0.16 30.08 -22.57 -3.05 -11.82 -11.40
33: -0.16 -3.11 9.40 12.60 10.43 10.63 15.43 15.68 2.79 7.34
34: 15.17 -11.80 -13.48 -11.73 14.03 -13.24 -3.46 -20.12 -2.78 -9.45
35: -6.63 -2.68 -0.86 9.60 7.31 21.32 -8.76 -6.93 -2.81 -8.05
36: 13.37 5.84 -5.23 -8.34 -1.45 -10.01 2.23 5.28 -4.44 27.11
TEST F16_F32_ALIGNED_M m=4096 n=49 k=11008 batch=2 split_k=4 matmul 68.89ms avg_err=0.0103066
TEST F16_F32_ALIGNED_L m=4096 n=49 k=11008 batch=2 split_k=4 matmul 35.26ms avg_err=55.8425
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 46.40 81.56 2.18 28.30 75.46 22.92 -4.86 -26.52 100.04 56.92
1: 67.54 -6.56 31.25 8.76 -56.16 40.37 12.78 -37.85 42.33 16.67
2: 46.95 -31.13 -5.91 51.15 23.24 -4.58 48.04 -33.27 -17.30 30.60
3: -9.87 -10.09 13.50 9.84 -63.22 -24.90 37.37 -9.68 -11.48 27.61
4: -31.84 -59.48 -0.10 -16.22 45.93 -28.45 40.42 30.10 17.29 -4.63
5: 5.26 -2.62 65.65 -3.00 2.57 1.28 -66.92 19.19 78.77 -25.74
6: 22.64 3.70 -7.62 -36.14 27.23 27.69 -7.13 47.94 -68.22 -24.35
7: -44.68 -27.32 -54.94 26.37 26.57 -7.83 5.59 11.07 9.02 -18.88
8: 6.01 12.90 35.79 -58.93 4.68 15.22 31.79 -9.34 -10.54 14.03
9: 36.90 -22.86 -75.08 10.21 -38.27 31.72 27.74 8.90 -30.69 25.16
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 23.98 10.59 42.62 4.31 88.62 -41.26 -22.87 -34.64 45.13 7.84
1: 39.83 7.75 -22.03 40.69 -45.56 9.92 19.51 -30.35 20.20 42.58
2: 61.59 -49.32 -22.53 53.77 11.86 59.75 10.11 22.47 -38.17 27.02
3: 21.61 20.50 20.57 -16.95 -57.20 -57.97 33.46 13.01 -33.27 51.31
4: -31.68 -108.70 -16.91 44.28 25.28 -52.40 108.31 -18.16 -46.43 -25.26
5: 12.64 -0.01 28.76 -40.85 -15.70 47.62 -11.94 38.04 -8.04 -9.37
6: -24.15 -31.94 2.82 -2.50 16.94 10.53 4.20 -3.90 -50.26 21.91
7: 9.49 -43.74 -62.87 35.82 17.50 51.88 24.93 -24.72 36.04 -20.63
8: 36.72 -21.76 51.18 -63.85 18.06 25.05 41.13 14.85 -0.02 30.29
9: -2.73 -60.36 -18.78 -20.44 -6.55 10.59 -31.41 24.36 37.99 -8.06
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 11.78 10.73 11.03 24.11 0.36 -18.11 -4.92 14.54 27.00 -13.66
1: 10.23 -11.75 -7.74 25.77 -42.66 -3.48 17.21 10.16 17.70 -4.17
2: 27.10 -25.40 -15.81 19.58 6.69 15.32 12.80 14.13 -8.42 4.29
3: -2.68 1.12 29.66 -13.73 -23.67 10.79 2.57 45.88 -7.77 4.51
4: -16.16 -37.77 -10.29 10.75 -2.60 1.26 28.82 13.13 4.46 -44.17
5: -2.36 6.80 28.65 -11.48 0.89 -20.38 4.29 27.54 12.59 -0.39
6: 6.68 -5.35 -19.29 -26.79 0.73 -23.66 -26.69 15.40 -6.39 8.31
7: -9.01 10.78 -24.72 17.75 11.89 22.80 2.87 6.33 -9.34 -13.81
8: 18.56 -3.50 23.96 -27.08 30.79 32.81 5.56 0.72 -8.98 40.57
9: -5.37 -18.62 -8.00 15.10 15.00 6.79 19.34 -1.95 6.87 5.82
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 5.44 35.63 22.30 -10.43 40.93 16.84 -6.05 -30.82 28.54 35.23
1: 23.91 9.09 23.83 -15.31 -12.73 32.98 -6.11 -12.39 29.92 5.16
2: -1.15 -27.21 -0.64 22.16 -1.84 -0.32 4.09 0.23 -7.58 -5.93
3: 20.34 -18.85 3.81 6.35 -8.35 -12.70 2.39 -13.10 1.38 11.60
4: -2.31 -47.54 -1.74 8.62 40.46 -12.06 36.08 -8.91 10.33 17.79
5: 14.31 -5.39 16.68 -8.37 22.28 11.91 -26.84 6.41 7.75 -1.92
6: -19.54 -0.17 -9.95 3.77 11.28 28.99 35.82 3.26 -25.98 -14.57
7: -13.69 -24.95 -35.27 18.80 -12.37 5.79 8.48 -14.67 19.08 8.80
8: -7.78 -25.24 -1.49 -14.30 -25.45 -19.22 9.97 -15.95 10.61 -22.13
9: -9.61 -21.82 -14.56 15.88 -6.01 9.03 -9.66 -2.24 -9.37 -18.15
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 5.48 25.63 -19.53 25.23 6.60 40.35 10.16 -15.70 26.42 24.39
1: 13.69 7.54 -3.72 15.52 -1.84 0.38 16.17 -20.11 7.39 40.47
2: -6.57 13.72 22.32 12.84 9.09 -23.33 27.52 2.76 4.58 11.39
3: -7.80 -14.55 -1.16 9.95 -5.52 -18.12 17.27 -33.99 -10.06 15.87
4: 12.11 11.26 -7.18 -38.26 10.96 -14.40 -16.10 22.60 -5.90 18.50
5: 5.89 -6.95 6.93 -9.38 -2.20 9.09 -19.67 -22.52 11.92 -24.33
6: 18.95 16.47 19.04 -4.50 -2.49 13.61 6.10 22.09 -26.11 -3.75
7: -20.84 2.07 17.27 -3.69 31.53 -5.71 -9.09 14.25 2.51 4.51
8: 0.51 0.88 2.50 -6.26 -13.90 4.79 34.10 -3.85 0.96 10.07
9: 48.71 10.15 -15.28 -34.31 -27.55 -13.09 32.10 -8.32 -13.89 19.11
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 23.70 9.57 -11.62 -10.61 27.57 -16.16 -4.06 5.46 18.08 10.95
1: 19.70 -11.43 18.88 -17.22 1.06 10.49 -14.49 -15.51 -12.69 -24.79
2: 27.56 7.76 -11.79 -3.43 9.31 3.75 3.61 -50.40 -5.88 20.84
3: -19.72 22.18 -18.82 7.27 -25.68 -4.87 15.14 -8.47 4.97 -4.37
4: -25.48 14.57 19.11 2.68 -2.88 -3.25 -8.38 3.27 8.40 3.25
5: -12.58 2.92 13.40 26.23 -18.40 0.67 -24.71 7.76 46.52 0.90
6: 16.55 -7.24 2.57 -8.62 17.71 8.75 -22.37 7.19 -9.74 -14.35
7: -1.14 -15.21 -12.23 -6.49 -4.48 -30.72 3.33 5.16 -3.23 -18.39
8: -5.28 40.77 10.82 -11.28 13.25 -3.16 -17.84 9.74 -13.12 -14.48
9: 3.17 7.43 -37.25 13.54 -19.71 28.99 -14.04 21.40 -14.31 18.38
TEST F16_F32_ALIGNED_S m=32000 n=49 k=4096 batch=2 split_k=1 matmul 124.37ms avg_err=22.9462
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 7.50 5.75 0.50 10.62 15.74 21.37 -9.29 -1.36 17.92 -39.14
28: -18.79 -7.43 23.06 -25.28 -9.69 16.78 1.06 -19.48 -29.12 -51.12
29: -12.17 7.62 21.27 -13.44 -34.66 -7.09 36.02 -16.88 -54.87 30.55
30: 5.69 7.33 7.71 11.42 -3.89 -5.02 21.19 22.92 2.89 -12.60
31: 33.99 8.85 6.75 10.92 -14.75 1.89 6.89 -20.34 51.96 -10.42
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 7.50 5.75 0.49 10.62 15.74 21.37 -9.29 -1.36 17.92 -39.15
28: -18.79 -7.43 23.06 -25.28 -9.70 16.77 1.06 -19.47 -29.12 -51.12
29: -12.17 7.61 21.27 -13.43 -34.66 -7.09 36.02 -16.88 -54.88 30.54
30: 5.69 7.32 7.70 11.41 -3.89 -5.02 21.18 22.92 2.89 -12.60
31: 33.99 8.85 6.75 10.93 -14.75 1.88 6.89 -20.33 51.96 -10.42
32: -34.60 22.52 6.82 -22.04 7.96 9.69 -8.64 -26.89 10.61 -9.53
33: -24.26 -16.78 8.88 -9.17 15.05 27.02 16.98 23.89 -38.14 34.95
34: 7.66 -33.11 -22.00 3.38 -8.40 7.08 19.07 -36.09 -23.53 -3.54
35: 5.87 5.08 -26.88 14.45 1.02 6.95 11.02 44.11 36.46 -2.31
36: 31.49 8.33 8.57 3.67 21.66 -12.76 -13.47 10.65 13.69 9.18
TEST F16_F32_ALIGNED_M m=32000 n=49 k=4096 batch=2 split_k=1 matmul 180.283ms avg_err=0.00628755
TEST F16_F32_ALIGNED_L m=32000 n=49 k=4096 batch=2 split_k=1 matmul 95.854ms avg_err=17.0307
m = 16000 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
15995: -19.26 12.04 27.72 -3.01 30.06 3.75 -26.36 -35.21 -22.45 15.16
15996: -18.44 -27.50 -5.41 4.78 20.48 -9.30 -6.24 -11.77 10.99 18.98
15997: 14.80 5.15 29.11 -9.24 -44.45 31.52 -4.32 -3.23 -5.65 15.58
15998: -21.41 -38.12 53.88 -4.09 -52.06 41.75 -25.32 -30.09 -23.44 45.00
15999: 25.18 7.23 -27.57 -15.42 -0.27 -35.96 13.33 27.06 19.00 11.54
16000: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16001: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16002: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16003: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16004: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
15995: -19.25 12.04 27.72 -3.01 30.06 3.75 -26.36 -35.21 -22.45 15.16
15996: -18.44 -27.50 -5.42 4.78 20.49 -9.30 -6.25 -11.77 11.00 18.98
15997: 14.80 5.14 29.11 -9.24 -44.45 31.52 -4.32 -3.23 -5.65 15.58
15998: -21.41 -38.12 53.88 -4.08 -52.05 41.75 -25.32 -30.09 -23.44 45.00
15999: 25.18 7.23 -27.57 -15.42 -0.28 -35.97 13.34 27.06 18.99 11.54
16000: 4.98 16.71 -20.38 -1.33 -8.56 -16.00 -1.36 22.95 -29.27 -9.47
16001: 5.12 -9.50 -24.43 -23.87 28.41 14.57 13.97 -8.87 4.04 -21.89
16002: 3.21 1.02 40.48 -2.94 -1.13 -37.09 -61.32 16.59 4.23 24.09
16003: -10.63 -14.14 10.19 -12.87 -13.53 13.32 -10.13 -17.93 -26.99 -40.33
16004: -5.27 -18.23 -11.96 19.34 13.87 -1.28 -3.57 27.16 -4.42 4.37
TEST F16_F32_ALIGNED_S m=32000 n=49 k=4096 batch=2 split_k=4 matmul 109.426ms avg_err=22.9076
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -9.01 3.03 1.07 2.41 10.27 -14.26 17.11 18.05 17.35 2.10
28: -38.11 4.38 -5.03 31.09 16.30 -8.91 -23.28 -21.56 31.49 -19.94
29: 10.58 27.45 63.68 -8.69 -2.04 -10.41 -8.40 -7.54 35.34 -26.56
30: 0.25 -24.68 19.17 28.61 15.47 -17.32 64.01 -6.82 33.26 -6.19
31: -21.86 -4.00 -18.73 -23.62 -45.16 -20.76 0.94 9.99 -19.62 43.44
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -9.00 3.02 1.08 2.41 10.27 -14.27 17.12 18.06 17.35 2.10
28: -38.10 4.38 -5.03 31.09 16.30 -8.91 -23.28 -21.57 31.49 -19.94
29: 10.58 27.45 63.67 -8.69 -2.04 -10.41 -8.39 -7.53 35.34 -26.56
30: 0.26 -24.68 19.17 28.61 15.48 -17.31 64.01 -6.82 33.26 -6.19
31: -21.86 -4.00 -18.73 -23.63 -45.17 -20.76 0.93 9.98 -19.62 43.44
32: -9.76 -15.99 -9.26 -16.29 3.97 -13.79 -17.12 25.51 -21.05 -28.95
33: -6.42 6.58 16.47 -6.13 3.46 -27.60 -48.82 -31.53 13.77 -1.70
34: -19.90 -16.23 24.45 17.16 -31.89 -24.40 -1.84 -17.63 -17.21 -13.88
35: -4.70 -14.44 32.72 8.32 -26.14 2.21 -3.72 2.90 17.19 3.65
36: 18.64 5.71 -23.52 43.31 -37.75 -42.53 -19.99 -24.49 7.62 -1.94
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 13.34 13.91 9.71 1.98 3.17 -13.72 14.20 9.00 13.92 1.13
28: 4.01 -11.51 -1.26 -0.34 -3.09 -1.88 -10.41 -17.07 -5.48 11.17
29: -1.26 -14.59 25.52 -8.57 11.93 7.43 -3.86 -15.81 10.91 7.07
30: -17.02 2.91 -6.39 14.53 2.56 -7.88 13.58 -5.05 -5.26 -0.19
31: -1.91 -10.38 -17.59 -11.82 -19.90 -22.29 -18.26 -3.64 -1.33 15.48
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -10.26 -7.74 11.07 6.59 16.57 -3.10 21.05 8.26 5.68 -16.50
28: -3.76 -7.82 3.55 3.25 9.68 8.08 10.39 -9.59 20.96 -3.63
29: 3.88 9.61 23.43 -6.59 -5.63 -9.07 -3.87 3.60 2.08 -18.73
30: 14.60 5.50 19.93 9.88 12.56 -12.50 29.09 6.16 12.85 -1.11
31: -11.39 -7.03 6.04 -0.84 -9.07 -0.57 11.69 1.54 -21.84 16.73
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -11.11 7.92 -14.28 1.19 -10.68 0.61 -13.38 -0.20 0.65 0.42
28: -20.17 7.63 -1.75 11.43 9.96 -11.97 -21.04 0.58 -1.44 -2.30
29: 2.66 36.68 -9.00 11.53 -17.13 -23.00 9.45 -10.97 14.52 -19.35
30: 12.98 -12.05 8.38 3.41 -5.82 -0.39 13.14 -9.62 29.00 -12.30
31: -10.50 6.92 3.43 6.13 -15.31 -1.76 13.99 7.93 -0.41 12.29
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -0.98 -11.07 -5.42 -7.34 1.22 1.95 -4.76 1.00 -2.90 17.05
28: -18.19 16.08 -5.58 16.75 -0.25 -3.14 -2.21 4.51 17.45 -25.18
29: 5.30 -4.25 23.72 -5.05 8.79 14.23 -10.11 15.65 7.84 4.45
30: -10.31 -21.03 -2.75 0.78 6.17 3.46 8.20 1.69 -3.33 7.41
31: 1.94 6.49 -10.62 -17.09 -0.89 3.86 -6.49 4.16 3.96 -1.07
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_ALIGNED_M m=32000 n=49 k=4096 batch=2 split_k=4 matmul 181.384ms avg_err=0.00627905
TEST F16_F32_ALIGNED_L m=32000 n=49 k=4096 batch=2 split_k=4 matmul 89.466ms avg_err=34.0288
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 9.31 -14.65 23.96 5.73 7.15 3.58 12.53 -26.44 10.63 1.97
1: 5.03 -22.01 13.01 10.06 -8.79 -10.55 8.50 24.72 0.21 15.59
2: -5.54 2.32 -27.92 -19.09 44.67 -26.06 -20.97 7.38 -11.90 -25.68
3: 14.60 5.93 0.38 36.77 -11.54 -2.07 -12.58 10.10 -29.90 -36.96
4: -6.46 8.04 -13.85 -10.36 5.54 0.68 -6.43 -38.85 41.79 -17.77
5: -3.44 3.69 16.05 39.74 -2.66 37.14 0.75 10.61 3.89 -3.14
6: 35.16 -6.46 -10.73 3.65 24.83 22.69 6.67 -0.54 10.77 8.72
7: 1.08 -39.76 3.67 -22.68 -0.05 -5.83 -10.05 32.81 -22.71 -21.00
8: 8.30 -5.94 16.99 2.32 23.16 -19.44 9.10 19.52 -22.86 -17.52
9: 9.05 15.20 -11.15 -40.39 -0.68 -12.38 -8.94 -29.87 6.77 -12.81
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 0.64 3.52 5.92 2.23 17.33 5.10 8.71 12.59 -12.18 23.36
1: -16.19 40.45 31.56 -9.03 4.27 -11.58 14.52 10.67 -13.25 -14.87
2: -12.73 -1.95 -9.25 -25.50 -3.16 3.37 -11.99 6.16 -2.85 -13.93
3: -25.65 -21.09 2.66 10.91 9.70 8.47 -2.69 6.57 6.73 36.18
4: 22.01 10.10 -29.55 1.37 -25.65 0.62 -2.38 -26.77 11.03 0.64
5: -11.27 1.05 18.87 15.60 -26.21 12.74 -7.45 5.41 -17.90 6.20
6: 19.51 27.69 20.59 1.44 17.83 32.76 22.60 -14.58 18.31 -18.97
7: -54.59 -20.74 -7.09 10.58 2.33 -25.84 -3.58 2.44 -52.91 -8.21
8: -19.00 -23.93 -34.42 31.60 2.44 4.82 9.27 24.82 -5.76 -13.07
9: -0.36 5.79 3.19 -21.36 41.71 9.40 -1.39 6.70 -12.69 -24.26
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -4.16 1.42 7.38 -11.15 10.89 3.48 -1.38 0.57 -6.35 15.06
1: -7.10 -0.30 12.34 1.15 1.64 12.13 8.51 14.09 0.18 -3.93
2: -2.84 -3.42 -5.45 -16.29 4.06 -7.93 5.67 7.58 -8.80 -2.42
3: 5.25 1.27 -12.68 -5.68 -0.68 0.38 11.21 1.48 -5.74 -1.60
4: 13.20 -0.86 -3.04 1.43 -5.56 8.10 -0.84 -11.28 20.16 -3.18
5: 3.81 12.18 2.87 25.01 -8.16 7.24 11.19 4.46 -6.79 -5.92
6: 7.42 -3.34 -12.35 -9.04 2.65 19.72 -6.43 -2.15 -2.52 -10.87
7: -20.28 -19.44 -26.56 -8.63 -18.55 -10.09 -0.39 8.12 -7.66 3.56
8: 4.04 -1.81 22.96 15.26 -19.25 -8.45 0.20 0.20 -3.44 0.10
9: 6.76 9.34 12.15 -4.99 12.98 8.48 -16.52 -4.92 5.83 -5.54
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -2.95 6.69 -8.24 2.58 2.81 -3.25 -5.13 1.72 0.92 -1.53
1: -10.90 12.39 4.51 -2.64 -1.61 -16.60 15.93 3.72 -3.06 2.66
2: -6.38 14.86 -7.59 -14.55 3.24 -21.02 -3.35 8.49 -7.00 -7.19
3: -5.62 1.65 9.32 11.00 -15.88 -1.03 -17.76 3.42 -5.22 1.72
4: -8.42 9.75 -22.55 -0.72 -9.41 9.61 -0.77 -14.86 21.33 -18.86
5: -1.69 -3.08 3.34 -14.94 -2.23 -4.25 -3.39 7.20 -1.61 0.13
6: 8.43 1.75 2.88 11.17 8.21 -2.55 9.22 -12.04 19.42 2.07
7: -6.75 -1.24 27.20 6.32 20.89 -13.07 -9.89 2.16 -25.36 -9.40
8: -8.54 -10.13 -19.79 4.20 13.92 9.10 -2.91 13.34 7.61 -21.20
9: 6.22 10.58 -19.60 -19.99 -6.89 -7.23 12.55 0.62 -0.55 -14.34
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 3.05 4.06 5.74 -1.26 -1.88 -17.19 2.64 -8.57 5.58 3.64
1: 15.84 -13.48 -11.62 -2.56 12.74 -6.96 -1.72 -11.25 -1.38 7.13
2: 16.62 -14.78 -32.08 9.55 19.29 -7.18 -12.17 -4.25 14.35 -15.69
3: 5.51 -8.14 -3.02 27.21 5.52 9.22 -4.27 11.88 -23.44 -27.82
4: 1.36 17.86 3.07 -19.22 6.13 -3.18 7.52 -11.82 17.82 -3.39
5: -6.45 -4.28 12.73 15.00 3.93 14.35 0.32 -6.58 9.54 7.34
6: 19.40 3.45 -5.53 -0.50 18.64 8.76 3.89 5.47 7.10 8.40
7: -6.43 -3.22 8.18 -5.78 -7.76 4.95 -0.52 21.67 -14.29 -5.61
8: 3.55 -2.29 8.41 -16.79 3.41 -18.11 -2.08 -3.71 -4.63 1.81
9: 1.35 -5.90 -7.63 -15.86 1.05 -2.80 7.73 -13.20 12.01 5.93
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 13.37 -26.83 19.08 15.56 -4.67 20.54 16.41 -20.16 10.48 -15.20
1: 7.19 -20.61 7.78 14.11 -21.57 0.88 -14.22 18.16 4.48 9.73
2: -12.93 5.66 17.21 2.20 18.07 10.07 -11.12 -4.45 -10.45 -0.38
3: 9.48 11.15 6.75 4.23 -0.51 -10.63 -1.76 -6.67 4.50 -9.27
4: -12.60 -18.71 8.67 8.15 14.39 -13.85 -12.33 -0.90 -17.52 7.66
5: 0.89 -1.13 -2.88 14.67 3.80 19.80 -7.37 5.53 2.75 -4.69
6: -0.08 -8.32 4.27 2.01 -4.67 -3.24 -0.02 8.17 -13.23 9.12
7: 34.54 -15.86 -5.15 -14.58 5.36 12.38 0.76 0.86 24.60 -9.56
8: 9.24 8.30 5.41 -0.34 25.07 -1.98 13.89 9.69 -22.40 1.78
9: -5.29 1.18 3.94 0.45 -7.82 -10.84 -12.70 -12.37 -10.51 1.13
TEST F16_F32_ALIGNED_S m=512 n=512 k=128 batch=2 split_k=1 matmul 1.903ms avg_err=4.53634
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 3.53 0.34 2.14 -4.02 -2.55 -2.37 4.39 0.90 1.36 -2.25
28: -3.02 6.55 -2.71 0.46 7.12 2.91 7.15 2.60 1.33 8.53
29: -3.29 0.08 1.84 -7.39 0.80 6.13 -1.10 -3.78 -1.13 -1.19
30: -1.09 0.12 -1.87 2.04 1.30 5.75 -3.71 7.61 2.56 -6.22
31: -0.41 1.67 0.81 -4.26 1.60 -0.16 0.72 1.21 2.45 0.19
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 3.53 0.34 2.14 -4.02 -2.55 -2.37 4.39 0.90 1.36 -2.25
28: -3.02 6.55 -2.71 0.45 7.12 2.91 7.15 2.60 1.33 8.53
29: -3.29 0.08 1.84 -7.39 0.80 6.13 -1.10 -3.78 -1.13 -1.19
30: -1.09 0.12 -1.87 2.04 1.30 5.75 -3.71 7.61 2.56 -6.23
31: -0.41 1.67 0.81 -4.26 1.60 -0.16 0.72 1.21 2.45 0.19
32: 1.52 0.12 -5.20 -0.61 4.66 1.14 -5.85 -0.14 3.42 -6.25
33: -5.20 -0.42 -4.90 -3.65 -1.22 -3.76 2.63 3.74 -4.22 8.38
34: 4.72 -4.13 -0.12 -3.08 -1.49 -4.25 2.98 5.42 -3.21 1.89
35: 3.88 3.46 -1.38 1.56 2.64 3.76 1.87 -1.83 3.97 -9.10
36: -4.60 -4.87 -1.23 -0.38 -2.59 -3.14 -0.30 6.44 -3.36 0.79
TEST F16_F32_ALIGNED_M m=512 n=512 k=128 batch=2 split_k=1 matmul 2.643ms avg_err=0.00110666
TEST F16_F32_ALIGNED_L m=512 n=512 k=128 batch=2 split_k=1 matmul 0.939ms avg_err=4.50784
m = 256 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
251: 3.59 -4.03 -8.38 -2.08 6.74 -1.50 4.02 1.70 -1.16 0.23
252: 4.21 -3.07 -0.08 -6.10 -4.33 1.87 -2.36 -0.75 -6.88 -1.98
253: -3.00 0.19 -0.39 6.35 -4.09 -0.54 1.77 -0.05 0.41 -4.08
254: 1.49 -1.41 -2.15 4.26 4.85 3.16 0.19 0.44 -5.71 -2.30
255: 0.56 -2.54 -1.48 3.11 -1.07 -1.42 -8.37 3.02 -3.95 -0.62
256: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
257: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
258: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
259: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
260: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
251: 3.59 -4.03 -8.38 -2.08 6.74 -1.50 4.02 1.70 -1.16 0.23
252: 4.21 -3.07 -0.08 -6.10 -4.33 1.87 -2.36 -0.75 -6.88 -1.98
253: -3.00 0.19 -0.38 6.35 -4.09 -0.54 1.77 -0.05 0.41 -4.08
254: 1.49 -1.41 -2.15 4.26 4.85 3.16 0.19 0.44 -5.71 -2.30
255: 0.56 -2.54 -1.48 3.11 -1.07 -1.42 -8.37 3.02 -3.95 -0.62
256: 0.05 -1.97 1.53 2.14 -4.96 -2.46 0.11 0.72 -1.38 1.04
257: 1.02 -5.36 -3.15 2.34 -7.94 -4.46 -2.22 -2.16 2.79 -0.58
258: 5.45 -0.35 0.65 1.53 5.16 6.05 9.42 -2.14 2.79 0.60
259: -0.84 3.88 4.47 -0.61 3.03 1.85 -4.40 -4.98 -0.92 1.23
260: -0.30 -3.56 1.94 -0.54 -1.27 2.61 -0.68 10.22 -0.53 -4.32
TEST F16_F32_ALIGNED_S m=512 n=512 k=128 batch=2 split_k=4 matmul 2.287ms avg_err=25.9604
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 1.22 -2.38 -5.48 3.61 0.65 2.63 6.03 2.65 -0.99 -1.75
28: 4.41 -1.15 -8.21 -1.72 -0.59 -4.25 2.51 3.96 4.62 2.81
29: -2.32 1.65 1.76 -0.10 6.93 -1.49 1.25 -0.78 -6.09 -2.87
30: 0.02 6.79 -0.60 1.48 3.34 -4.56 1.65 5.58 4.06 0.77
31: -2.63 -2.80 1.20 0.67 -4.58 5.70 5.24 6.01 4.69 3.33
32: 1.13 1.51 -33.24 8.62 19.10 -12.00 -7.44 -9.97 28.53 -24.90
33: 21.63 35.40 11.46 24.98 32.80 -7.64 -14.51 -30.07 -3.23 -20.73
34: 3.86 16.63 -11.45 -23.51 13.55 9.59 19.77 47.61 27.09 39.18
35: -12.37 -4.23 -22.92 -3.42 8.37 -31.65 -5.34 20.15 3.44 9.14
36: 6.92 10.54 25.85 1.39 -39.51 6.70 36.18 51.37 16.91 4.74
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 1.22 -2.38 -5.48 3.60 0.64 2.63 6.03 2.65 -0.99 -1.75
28: 4.41 -1.15 -8.21 -1.72 -0.59 -4.25 2.51 3.96 4.62 2.81
29: -2.32 1.65 1.76 -0.10 6.93 -1.49 1.25 -0.79 -6.09 -2.87
30: 0.02 6.79 -0.61 1.48 3.34 -4.56 1.65 5.58 4.06 0.77
31: -2.63 -2.80 1.20 0.67 -4.58 5.70 5.24 6.01 4.69 3.33
32: -3.39 1.30 1.90 3.24 3.49 -6.18 3.53 -3.11 -2.10 -0.03
33: -2.29 -1.16 -1.94 -0.55 3.47 -2.47 0.19 -2.68 0.32 -1.99
34: 6.22 -4.76 5.11 -1.02 -3.20 7.03 2.22 -1.71 2.06 -6.22
35: 2.52 -3.68 0.87 -0.07 -2.63 4.96 3.04 -1.88 3.12 -2.14
36: 2.64 -1.72 -0.24 1.43 -6.17 5.37 -0.33 -1.45 2.46 -3.01
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 0.94 0.02 -1.60 0.89 4.27 0.94 1.08 -1.27 2.32 -0.08
28: -0.11 -1.44 0.75 1.04 -2.23 -0.11 1.29 1.96 0.33 -4.46
29: 4.05 1.91 0.29 -2.13 -0.03 -0.89 3.43 0.02 0.24 0.90
30: 1.34 2.14 -0.63 2.76 2.26 -4.25 -4.07 -0.58 0.64 -2.76
31: -0.32 -3.28 0.48 -1.57 -2.14 2.58 -1.46 -0.43 4.29 4.64
32: 8.27 -0.88 -9.32 9.99 -1.78 -5.51 2.49 2.95 -3.11 -16.80
33: 11.00 -7.36 3.63 -7.99 13.32 8.13 7.67 -0.08 1.76 4.46
34: 11.38 7.63 -6.00 5.88 -15.03 16.89 13.02 17.18 5.61 -1.44
35: -6.50 12.37 7.96 27.20 -12.76 0.87 -9.14 8.55 2.94 3.12
36: -5.83 -5.79 -3.57 -0.77 -6.40 0.00 14.46 24.76 4.07 11.12
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 2.42 0.05 -2.24 0.09 -0.70 1.90 3.47 2.63 0.25 -0.09
28: 4.30 -1.21 -4.35 -1.30 0.16 0.74 2.62 2.90 -0.61 3.44
29: -4.69 -1.36 -0.80 -0.92 3.29 0.08 -0.03 -2.95 -4.87 -2.56
30: 1.02 0.43 0.12 0.11 1.69 -0.90 4.17 1.92 3.44 1.54
31: 0.81 1.00 -0.01 -0.22 0.58 2.61 2.10 1.66 2.40 -1.59
32: -12.52 1.35 -0.07 -20.43 -14.70 1.28 -3.69 8.36 -6.08 -12.14
33: 2.95 10.61 1.21 14.88 16.85 3.38 -0.67 -4.80 -0.84 -22.49
34: -0.47 -7.72 -0.57 -15.84 -4.95 -21.01 10.08 1.29 3.68 6.54
35: -2.93 -2.95 5.43 -17.48 7.58 -10.92 -5.63 10.53 4.12 2.65
36: 2.30 18.29 2.63 -9.93 -23.66 3.12 4.94 4.30 18.33 4.63
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 0.72 -1.39 -1.06 3.15 -2.63 0.53 -0.55 1.52 -1.17 0.72
28: 0.90 -0.25 -0.51 -1.26 0.92 0.18 0.83 -2.41 1.44 1.76
29: -1.34 1.13 3.21 2.65 2.82 -0.29 -2.51 -0.22 -0.22 0.58
30: -2.30 3.44 -1.50 0.68 1.62 -1.51 -1.39 1.01 0.73 1.88
31: -1.26 -1.37 2.32 -0.80 -1.69 -0.23 -0.23 2.15 -0.09 2.79
32: -1.90 4.58 -24.77 8.57 11.69 11.13 -7.93 0.97 14.59 0.68
33: 14.61 -0.59 4.76 -2.15 -0.30 -10.25 -23.78 -9.95 -7.62 -0.61
34: -8.86 1.70 18.27 -0.53 24.19 -0.87 5.95 13.37 -1.09 7.47
35: 12.65 -11.46 -32.69 -6.90 2.27 -6.83 9.08 4.79 1.44 -10.64
36: 18.03 -1.27 14.35 3.32 -7.79 11.05 7.39 2.53 -0.11 -5.28
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -2.86 -1.05 -0.58 -0.52 -0.29 -0.74 2.03 -0.23 -2.39 -2.31
28: -0.68 1.75 -4.09 -0.20 0.56 -5.07 -2.24 1.51 3.45 2.06
29: -0.35 -0.04 -0.93 0.30 0.85 -0.39 0.36 2.36 -1.23 -1.80
30: -0.04 0.78 1.41 -2.06 -2.24 2.10 2.94 3.22 -0.74 0.12
31: -1.86 0.85 -1.59 3.25 -1.33 0.74 4.83 2.63 -1.90 -2.52
32: 7.28 -3.55 0.91 10.49 23.89 -18.89 1.69 -22.24 23.12 3.35
33: -6.93 32.73 1.85 20.24 2.92 -8.90 2.28 -15.24 3.47 -2.10
34: 1.80 15.02 -23.15 -13.02 9.35 14.59 -9.28 15.78 18.89 26.60
35: -15.58 -2.18 -3.62 -6.25 11.29 -14.77 0.35 -3.72 -5.06 14.01
36: -7.58 -0.69 12.44 8.77 -1.65 -7.46 9.38 19.78 -5.38 -5.73
TEST F16_F32_ALIGNED_M m=512 n=512 k=128 batch=2 split_k=4 matmul 2.894ms avg_err=0.00110848
TEST F16_F32_ALIGNED_L m=512 n=512 k=128 batch=2 split_k=4 matmul 1.149ms avg_err=7.26269
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 2.76 -4.12 -6.18 6.06 -5.38 -8.07 2.08 -3.62 5.55 -1.18
1: 4.30 4.24 -6.19 -2.48 -0.35 -5.40 2.24 -0.27 -6.03 4.38
2: 4.44 -5.01 -1.42 -0.32 2.62 2.40 -0.16 4.56 1.25 8.85
3: -0.22 -0.72 3.47 6.68 4.74 -1.07 1.88 -0.98 3.92 -7.08
4: 7.74 -3.03 0.63 -7.47 1.24 -0.42 4.75 0.71 -5.32 -1.63
5: -2.35 -5.85 5.65 1.63 6.62 4.83 4.97 -2.57 -1.43 -2.20
6: -2.40 -5.70 5.74 13.89 -1.59 -2.04 0.64 -3.02 -1.27 2.27
7: -3.78 -3.39 0.31 2.75 -0.81 1.08 5.29 -0.55 0.56 -4.08
8: 4.29 -2.39 -3.61 -1.80 -1.71 -4.13 -2.06 1.62 -1.33 1.21
9: 0.55 7.25 2.08 1.03 -4.62 -6.35 -0.49 4.44 -1.38 5.09
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 1.98 -6.76 -6.77 6.07 -8.76 0.24 -0.55 -3.50 8.53 6.75
1: 5.98 6.22 -5.49 -0.04 -0.45 -5.84 -2.62 -1.49 -3.95 -7.39
2: 0.86 -1.23 0.56 -1.00 3.99 7.19 -6.23 -1.24 0.01 4.64
3: -2.88 0.08 -0.43 6.14 2.41 -6.48 -1.52 -1.05 3.04 -2.59
4: 3.75 -1.04 -0.02 -6.75 -0.04 -1.95 4.43 2.81 -2.17 2.28
5: -4.97 -4.43 3.51 1.65 -3.84 2.82 4.50 -3.34 -3.28 2.25
6: -7.28 -4.17 6.10 8.34 4.14 -5.44 -2.39 -0.77 0.08 -4.37
7: -5.39 -1.84 -1.42 -0.72 -2.23 2.02 0.06 4.69 2.99 3.48
8: 7.55 -2.86 -4.18 7.05 1.11 -1.80 0.28 -1.80 -0.93 1.98
9: 1.62 2.00 -0.54 -0.94 1.11 0.02 3.18 0.00 -6.63 5.43
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -0.28 -2.04 -1.10 1.97 -3.24 -1.66 -1.00 -3.75 0.07 -2.46
1: -0.43 -0.59 -2.34 -1.34 1.61 -1.06 1.40 -0.79 2.12 -2.54
2: 2.35 -2.20 0.56 0.76 -0.61 5.29 -0.22 5.60 -1.43 3.97
3: -4.31 1.69 1.51 2.37 2.42 -5.17 -0.07 -2.35 2.43 0.25
4: 1.38 -2.52 1.22 -2.48 -0.02 1.75 2.61 2.40 -2.35 -1.33
5: 0.99 -0.98 1.32 -3.33 -0.93 1.51 1.61 -0.48 -3.77 -1.03
6: -1.62 0.93 3.22 2.64 0.64 -3.81 -0.69 -0.73 -1.06 -2.96
7: -3.02 2.45 -0.83 2.09 0.68 -1.34 -0.45 0.51 -1.62 -1.98
8: 1.89 -0.93 -2.49 0.30 1.22 0.89 -1.62 2.25 -0.33 1.85
9: 1.30 0.50 -0.22 2.39 0.54 -5.37 -0.44 1.96 -2.26 4.36
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 1.25 -0.15 -1.43 3.52 -1.83 -0.98 -2.06 -0.54 5.92 1.50
1: 1.52 3.16 -2.18 2.04 -0.77 -1.89 -2.83 1.49 -1.20 0.26
2: -0.25 0.25 -0.18 -3.59 1.70 -0.82 -3.10 -1.68 0.38 0.22
3: 1.84 -2.62 -1.96 2.07 -0.67 2.40 1.43 2.40 0.83 -2.43
4: 3.63 -2.36 -1.22 -2.19 1.70 -3.63 1.69 -0.51 1.26 1.99
5: -1.96 -3.26 2.52 2.76 2.09 1.95 1.06 -0.20 0.43 -0.27
6: -2.23 -2.66 1.79 5.68 -0.21 1.02 -1.91 -3.19 -3.13 1.00
7: 0.21 -0.95 1.61 -2.08 0.28 1.69 1.04 2.69 2.36 0.92
8: 2.76 -0.35 -0.57 0.41 -0.80 -5.40 1.51 0.18 -1.33 2.06
9: -0.42 1.86 1.52 -1.13 -1.32 1.65 0.55 0.65 -1.90 -0.54
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 1.31 -1.32 -2.20 0.55 -0.25 -3.45 2.35 1.18 1.11 -0.57
1: 0.11 1.73 -1.49 0.23 0.16 0.22 1.34 1.15 -4.85 3.18
2: 0.66 0.07 -1.39 4.76 1.23 0.31 2.58 0.78 3.59 2.13
3: 2.64 -0.24 2.11 2.49 -0.80 0.41 1.73 -2.93 0.36 -0.49
4: 0.18 -1.31 3.16 -2.81 -0.80 4.40 0.72 0.62 -1.01 -0.06
5: -2.25 1.14 -0.69 1.44 3.18 -0.26 -1.35 -1.60 1.00 -1.26
6: 1.50 -3.13 1.47 4.08 -1.96 2.05 3.02 3.15 1.95 2.62
7: 0.17 -0.93 -1.86 3.81 -1.56 0.26 2.49 -3.13 2.87 -2.04
8: -1.18 0.35 -0.75 0.36 -0.33 2.10 -3.44 0.16 0.53 -2.24
9: -0.08 1.33 0.37 0.98 -0.08 -0.29 -1.07 -0.99 2.46 0.28
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 0.47 -0.61 -1.45 0.01 -0.07 -1.97 2.79 -0.51 -1.55 0.35
1: 3.10 -0.05 -0.19 -3.41 -1.35 -2.67 2.32 -2.13 -2.11 3.48
2: 1.68 -3.12 -0.41 -2.25 0.30 -2.38 0.58 -0.14 -1.29 2.54
3: -0.38 0.45 1.80 -0.25 3.79 1.28 -1.21 1.89 0.31 -4.42
4: 2.55 3.16 -2.53 0.01 0.36 -2.94 -0.28 -1.80 -3.23 -2.24
5: 0.87 -2.74 2.50 0.77 2.28 1.64 3.64 -0.29 0.91 0.36
6: -0.05 -0.84 -0.74 1.49 -0.05 -1.30 0.22 -2.26 0.97 1.61
7: -1.14 -3.96 1.39 -1.06 -0.21 0.47 2.21 -0.63 -3.05 -0.99
8: 0.83 -1.46 0.19 -2.86 -1.79 -1.72 1.49 -0.98 -0.20 -0.47
9: -0.25 3.56 0.40 -1.21 -3.76 -2.35 0.47 2.81 0.31 0.99
TEST F16_F32_ALIGNED_S m=128 n=512 k=512 batch=2 split_k=1 matmul 2.403ms avg_err=9.06398
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 1.79 -5.06 4.75 -0.43 -6.88 8.89 1.99 2.93 -8.36 6.09
28: 12.64 4.22 -11.66 -8.08 -6.64 -7.01 3.01 -4.13 4.98 -0.42
29: 0.47 -0.44 12.41 -1.33 -0.19 -6.47 -8.33 -8.78 8.42 -6.57
30: -8.33 5.43 -3.28 -6.28 -5.06 4.18 14.12 -6.87 -20.20 6.78
31: -4.37 -2.10 -0.24 -0.74 -2.29 6.29 2.09 0.34 -1.54 8.93
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 1.79 -5.05 4.75 -0.43 -6.89 8.88 1.99 2.93 -8.36 6.09
28: 12.64 4.22 -11.66 -8.08 -6.64 -7.01 3.01 -4.13 4.98 -0.42
29: 0.47 -0.44 12.41 -1.33 -0.19 -6.47 -8.33 -8.78 8.43 -6.57
30: -8.33 5.43 -3.29 -6.28 -5.06 4.18 14.12 -6.88 -20.19 6.78
31: -4.37 -2.10 -0.24 -0.74 -2.29 6.29 2.09 0.34 -1.54 8.93
32: -1.04 18.77 -7.00 7.32 4.57 -15.77 -0.23 2.58 -6.12 12.84
33: -10.20 5.63 7.70 12.00 -0.99 7.75 -15.01 -9.97 3.81 -11.71
34: 0.95 -4.13 2.74 5.53 2.91 -11.92 -3.78 -7.04 -1.11 3.34
35: 8.21 0.02 15.81 -9.05 14.31 -0.40 14.64 10.49 14.60 2.07
36: 7.56 -1.32 8.15 16.80 2.83 -0.27 -10.87 -2.92 9.31 -5.56
TEST F16_F32_ALIGNED_M m=128 n=512 k=512 batch=2 split_k=1 matmul 2.827ms avg_err=0.0022141
TEST F16_F32_ALIGNED_L m=128 n=512 k=512 batch=2 split_k=1 matmul 1.46ms avg_err=9.05629
m = 64 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
59: 9.70 -3.45 2.57 -1.43 -8.00 -4.47 -4.75 -1.35 -2.77 3.00
60: -2.43 6.77 3.38 -12.36 -5.94 4.49 -5.27 15.64 2.92 13.74
61: -4.28 -6.17 -7.11 -18.87 -0.22 -4.91 1.80 -16.00 -7.29 -7.64
62: 6.68 0.13 11.62 11.20 5.08 -2.18 -1.46 4.93 -15.10 5.14
63: 10.89 3.99 -4.33 0.16 12.19 1.47 -2.65 1.12 0.79 -6.32
64: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
65: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
66: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
67: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
68: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
59: 9.70 -3.45 2.58 -1.43 -8.00 -4.47 -4.75 -1.35 -2.77 3.00
60: -2.43 6.77 3.38 -12.36 -5.94 4.49 -5.27 15.64 2.92 13.74
61: -4.28 -6.17 -7.11 -18.87 -0.22 -4.91 1.80 -16.00 -7.29 -7.64
62: 6.68 0.13 11.62 11.19 5.08 -2.18 -1.46 4.93 -15.10 5.13
63: 10.89 3.99 -4.33 0.16 12.19 1.47 -2.65 1.12 0.79 -6.32
64: 5.09 -7.78 2.29 -6.61 4.38 0.41 14.21 6.19 2.01 14.71
65: 12.05 -10.24 -6.04 11.73 6.06 7.06 5.26 11.99 -3.45 0.00
66: 11.84 6.73 -5.65 -3.01 8.54 4.55 5.09 4.89 -1.23 3.46
67: 2.40 -1.51 9.38 9.28 -6.12 -3.32 -6.79 -1.57 7.82 3.38
68: 13.83 1.82 -5.43 -2.05 -6.65 -3.73 17.39 -0.08 -7.70 9.03
TEST F16_F32_ALIGNED_S m=128 n=512 k=512 batch=2 split_k=4 matmul 1.869ms avg_err=10.093
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -6.08 -2.01 -11.35 1.13 9.02 -0.58 1.92 -1.49 -0.83 6.80
28: -1.52 -2.26 13.16 -18.45 2.54 10.42 0.22 -11.31 -6.65 -2.24
29: 9.63 4.99 0.04 -6.25 -4.92 -7.00 4.19 -2.02 -9.73 -13.23
30: 0.57 1.35 1.26 -2.61 5.25 6.72 7.79 4.53 -6.39 -2.93
31: 18.93 1.56 3.58 -16.10 -11.22 -6.31 -8.23 4.99 -7.52 16.02
32: -3.72 -1.31 -1.69 0.65 -0.18 1.64 -2.96 -3.59 6.83 3.58
33: -5.84 -8.57 -2.27 3.34 0.69 -0.37 -3.13 -4.94 -2.11 -2.31
34: -1.36 -2.98 -3.35 -1.76 6.29 -1.40 -0.38 2.20 2.16 -4.68
35: -1.69 2.45 -1.11 -3.58 4.00 2.71 2.44 5.95 -4.72 -6.81
36: -1.85 -2.84 -1.61 -5.29 -5.42 -0.62 3.32 0.23 3.76 2.39
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -6.07 -2.00 -11.35 1.13 9.02 -0.59 1.92 -1.48 -0.83 6.80
28: -1.52 -2.26 13.16 -18.45 2.54 10.42 0.22 -11.31 -6.65 -2.24
29: 9.63 4.99 0.04 -6.24 -4.92 -7.00 4.19 -2.02 -9.73 -13.23
30: 0.57 1.35 1.26 -2.61 5.25 6.72 7.79 4.53 -6.39 -2.93
31: 18.93 1.56 3.58 -16.09 -11.23 -6.31 -8.23 4.99 -7.52 16.02
32: 8.06 12.20 7.64 -17.62 -3.31 -16.09 4.58 1.36 -0.55 -17.19
33: -2.28 -3.92 2.73 -14.63 5.14 -5.66 10.61 -8.32 -11.70 -4.86
34: 4.86 13.74 3.78 3.97 5.49 -3.37 16.55 1.55 3.44 -17.80
35: -0.61 7.69 11.42 -3.81 1.09 -2.32 -6.37 1.81 -1.60 3.74
36: 5.23 -13.75 2.65 -9.22 -6.66 -1.61 5.13 3.85 11.67 0.22
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: -3.94 -4.48 -2.75 3.28 0.59 4.06 2.10 -10.81 -0.53 4.85
28: 2.72 3.89 5.87 -4.54 0.15 3.03 2.05 -8.41 -7.23 3.31
29: 6.54 1.55 1.30 -3.16 -2.86 -6.29 2.63 -2.15 0.64 -9.73
30: 0.76 4.15 -0.75 -4.52 1.45 0.45 3.82 1.81 -1.07 -3.28
31: 7.33 -0.90 1.18 -0.58 4.53 -4.89 0.38 -0.98 -0.41 7.63
32: -2.77 -3.36 0.11 0.29 -2.34 -0.39 2.73 -0.55 0.26 1.01
33: -3.30 -1.81 0.68 1.78 0.33 -0.33 -2.64 -0.75 1.38 -2.56
34: -1.22 -3.48 1.43 -1.30 1.27 -0.22 0.46 1.58 2.60 -1.92
35: -1.56 0.54 1.68 -1.57 3.61 1.72 -0.60 1.93 -0.68 -3.05
36: -0.90 0.05 0.12 -3.15 -2.11 3.36 0.81 1.54 2.44 0.73
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 6.73 2.29 -2.59 0.75 -1.33 -0.99 1.26 3.84 -0.63 3.32
28: -2.60 1.36 2.44 -8.39 0.89 1.00 -0.91 -6.20 0.22 -5.94
29: 2.51 4.80 3.43 -8.54 1.97 -0.87 2.18 -0.46 1.96 1.06
30: 5.97 -2.07 -1.08 3.42 5.19 3.20 4.87 3.42 1.09 2.61
31: -3.42 0.02 4.64 -0.45 -1.07 -5.66 -3.79 4.89 0.54 5.50
32: -0.32 -0.42 -2.46 -1.14 -0.57 1.76 -0.13 -1.54 5.19 -0.55
33: -2.63 -2.39 -1.73 -1.09 2.86 -0.97 0.95 -0.43 -1.29 -1.26
34: -1.47 -0.47 -1.38 1.29 2.68 -2.95 -1.44 0.81 1.92 -1.69
35: 0.57 -0.43 -1.77 0.85 -0.07 2.49 0.19 0.97 -4.27 -0.12
36: -1.42 -3.25 0.15 1.88 -1.58 -3.44 4.18 -1.16 1.70 0.89
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -7.15 -1.39 -3.33 -3.14 4.72 1.83 -7.44 1.13 4.38 0.57
28: 1.08 -4.86 5.09 -2.67 -2.21 8.46 -0.92 1.73 0.06 -1.69
29: 1.32 -1.66 -0.67 2.69 -1.54 0.80 0.53 0.41 -2.83 -2.26
30: -1.74 -2.50 4.50 -5.16 1.36 3.81 1.87 4.32 -7.48 3.09
31: 11.52 4.96 -5.71 -7.70 -7.72 3.55 -0.94 -1.50 -6.14 0.85
32: -0.40 1.02 3.05 1.87 1.87 0.43 -2.57 -0.24 1.39 0.71
33: 0.40 -1.73 -0.11 -1.58 -0.16 -0.93 -2.65 -2.28 0.29 1.86
34: 2.85 0.43 -3.28 0.04 2.12 4.15 0.22 0.14 1.32 1.93
35: 1.41 1.74 0.40 -0.01 -0.38 -2.85 1.91 4.90 1.50 -2.65
36: -0.32 1.34 -1.95 -0.02 -0.20 0.34 3.19 -0.63 -2.22 1.02
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -1.71 1.57 -2.68 0.24 5.04 -5.48 5.99 4.35 -4.05 -1.95
28: -2.73 -2.65 -0.24 -2.85 3.72 -2.08 0.01 1.57 0.30 2.09
29: -0.74 0.29 -4.02 2.77 -2.50 -0.65 -1.15 0.17 -9.49 -2.31
30: -4.42 1.78 -1.41 3.65 -2.76 -0.75 -2.77 -5.02 1.07 -5.35
31: 3.50 -2.53 3.46 -7.36 -6.96 0.69 -3.88 2.58 -1.51 2.04
32: -0.23 1.45 -2.40 -0.38 0.86 -0.16 -2.99 -1.27 -0.01 2.41
33: -0.31 -2.64 -1.10 4.23 -2.35 1.86 1.21 -1.46 -2.48 -0.36
34: -1.53 0.53 -0.11 -1.79 0.21 -2.38 0.38 -0.32 -3.68 -2.99
35: -2.10 0.61 -1.42 -2.86 0.84 1.36 0.95 -1.86 -1.26 -0.99
36: 0.79 -0.98 0.07 -3.99 -1.53 -0.88 -4.85 0.48 1.85 -0.26
TEST F16_F32_ALIGNED_M m=128 n=512 k=512 batch=2 split_k=4 matmul 2.664ms avg_err=0.00222386
TEST F16_F32_ALIGNED_L m=128 n=512 k=512 batch=2 split_k=4 matmul 2.282ms avg_err=14.5171
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -1.40 -3.54 6.84 -6.15 0.50 -14.21 2.14 -3.32 0.29 -0.00
1: -2.84 -2.12 -3.67 -5.95 -8.07 -0.89 7.93 -8.26 -7.71 6.68
2: -8.09 -6.49 6.06 -1.95 3.08 0.10 1.25 -9.08 -2.74 9.88
3: 10.24 -6.68 -0.68 8.93 6.17 0.99 4.62 15.38 -4.42 -14.89
4: -2.90 -0.70 0.18 -9.83 -1.96 16.10 -2.11 3.47 -1.21 5.44
5: 5.23 0.51 -4.22 -8.03 -9.29 -0.85 3.73 -3.11 -2.99 6.85
6: -4.46 12.50 -1.41 -10.48 -9.58 11.68 -8.68 -3.50 0.80 -0.73
7: 0.82 3.10 -2.14 5.15 -12.55 -6.98 -8.43 7.90 10.66 -1.34
8: -14.36 12.69 6.37 -4.64 2.69 7.35 11.56 -3.39 10.52 -1.41
9: -8.30 1.94 3.41 5.89 7.98 2.03 7.90 0.12 -2.65 -0.37
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -4.25 -6.30 -4.41 3.42 4.24 -9.19 -4.04 6.40 -2.20 -10.11
1: 7.03 -10.54 -5.72 -1.28 -1.50 -14.45 7.51 -10.15 1.92 -4.07
2: -4.59 1.91 16.28 -12.75 0.66 8.54 -0.74 -13.03 -14.97 -0.92
3: -1.36 -12.11 -1.22 6.43 1.10 -13.58 7.15 8.29 -2.47 -9.73
4: 2.19 0.37 12.77 -10.37 -1.75 7.10 3.61 -4.12 -2.80 2.62
5: 1.47 -14.38 0.41 -8.89 -2.03 2.36 8.49 5.23 1.23 -0.35
6: 1.57 10.51 -4.30 -10.90 -8.70 -2.50 -0.96 1.70 2.94 4.66
7: 7.31 -2.38 -8.07 -1.57 -3.28 -2.05 -9.19 2.10 9.09 -7.29
8: 8.05 9.27 13.56 -13.43 2.54 0.94 3.21 0.45 16.67 -1.02
9: -7.67 5.25 -0.33 -3.99 -1.57 -10.36 -3.97 -4.52 -14.21 -0.82
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -3.48 -3.34 4.61 -5.41 -2.19 -4.38 3.02 -3.37 -2.31 -6.96
1: -4.20 -2.34 -1.13 -5.28 -3.97 -3.44 3.15 0.51 2.88 -2.77
2: -0.77 3.01 0.36 -2.74 0.00 -3.48 0.72 -1.00 -1.44 1.17
3: -3.95 -1.57 3.88 1.56 -0.12 2.33 0.66 4.23 -4.67 -0.65
4: -4.61 0.26 1.78 -1.06 2.20 6.72 -1.65 4.63 3.02 0.98
5: 3.36 2.43 -1.98 -4.02 -6.61 -2.22 3.51 -1.15 5.09 0.09
6: 2.13 3.06 0.27 -1.93 -0.49 -5.15 -7.90 -0.15 1.23 5.18
7: -1.15 3.17 -0.87 5.75 -2.16 -4.36 -3.44 6.84 6.72 -1.42
8: 0.79 4.66 3.82 0.32 0.09 0.29 -1.20 1.23 4.02 -3.14
9: -7.02 -3.86 -3.40 2.61 5.84 -2.33 3.11 -5.84 0.16 -3.15
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -4.23 -0.95 -0.37 2.02 2.14 -5.77 -5.90 2.03 1.80 1.01
1: 4.77 1.72 1.59 1.91 -2.89 -0.97 -0.72 -3.95 -0.53 -0.45
2: -5.33 -3.02 3.63 -4.06 5.74 3.07 -1.33 -4.89 -5.01 3.18
3: 9.30 -1.16 2.58 3.60 2.76 -8.81 3.45 7.04 0.88 -5.69
4: -0.23 -1.45 0.10 -9.40 -0.49 4.03 6.69 -4.18 -2.08 0.72
5: 2.52 -5.96 2.58 -2.96 1.50 3.30 0.75 2.34 1.17 5.53
6: -2.21 3.79 -0.57 -0.58 -3.03 2.86 0.25 -3.05 -0.82 -0.01
7: 0.24 2.35 -3.04 -0.79 0.11 1.87 -3.53 -3.86 5.14 -0.11
8: -3.16 3.97 4.19 -5.65 0.09 3.70 4.98 -2.92 4.27 2.45
9: 0.57 2.68 2.00 -2.63 -4.98 0.64 0.19 2.18 -4.10 -0.18
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 1.82 -1.61 -0.77 -4.60 -2.19 -4.00 3.48 -4.02 -1.14 -2.65
1: -3.58 1.39 1.26 0.91 -0.77 2.85 5.49 -1.28 -1.97 2.00
2: -3.66 0.05 -2.14 0.09 0.08 -0.41 -5.65 -2.02 8.50 1.82
3: 4.78 -3.13 -2.98 -0.63 1.46 1.68 1.45 -0.38 2.39 -6.01
4: 1.31 0.99 2.32 -1.04 -0.80 4.90 -8.56 0.26 -0.64 -0.62
5: -0.13 1.35 -0.93 -3.33 -4.92 0.23 -1.73 -1.22 -2.41 -5.29
6: -2.35 -0.12 -4.46 -6.88 3.23 5.93 -1.58 -1.89 2.38 -3.73
7: 2.76 1.22 3.68 -0.95 -4.24 -2.88 4.99 2.53 -2.47 -4.84
8: 2.82 -1.11 1.76 2.62 -4.59 4.64 1.65 -3.69 1.11 -3.53
9: -4.21 1.32 3.58 1.86 3.82 3.41 3.80 -1.25 3.76 -1.99
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 4.49 2.36 3.37 1.85 2.74 -0.06 1.54 2.05 1.93 8.61
1: 0.17 -2.90 -5.38 -3.48 -0.44 0.67 0.01 -3.55 -8.09 7.91
2: 1.66 -6.53 4.21 4.76 -2.74 0.91 7.50 -1.17 -4.79 3.71
3: 0.10 -0.82 -4.17 4.40 2.07 5.78 -0.94 4.49 -3.02 -2.54
4: 0.63 -0.50 -4.01 1.68 -2.87 0.45 1.42 2.76 -1.50 4.36
5: -0.52 2.68 -3.89 2.28 0.74 -2.17 1.21 -3.08 -6.85 6.52
6: -2.02 5.77 3.36 -1.09 -9.29 8.04 0.55 1.59 -1.99 -2.16
7: -1.04 -3.64 -1.91 1.14 -6.26 -1.62 -6.46 2.39 1.27 5.02
8: -14.81 5.17 -3.39 -1.93 7.10 -1.29 6.13 1.99 1.12 2.81
9: 2.35 1.79 1.22 4.06 3.29 0.31 0.79 5.03 -2.48 4.95
TEST F16_F32_ALIGNED_S m=4096 n=512 k=4096 batch=2 split_k=1 matmul 131.473ms avg_err=25.5469
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -12.37 -16.97 21.13 2.38 28.04 -23.67 40.00 -20.71 12.78 4.67
28: -26.70 -12.23 14.41 -15.08 16.63 -12.67 -26.63 -24.96 0.13 12.49
29: -13.33 2.96 -28.42 -3.56 6.16 3.64 -17.86 37.92 -13.31 -15.96
30: -6.11 13.57 10.10 2.58 17.78 -20.39 -52.32 -11.99 4.60 9.32
31: 34.67 11.57 9.90 -19.45 25.45 -8.14 -11.17 -2.35 0.98 20.43
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -12.37 -16.97 21.13 2.37 28.04 -23.68 40.01 -20.72 12.78 4.66
28: -26.69 -12.23 14.41 -15.08 16.63 -12.67 -26.63 -24.96 0.12 12.48
29: -13.33 2.96 -28.42 -3.56 6.17 3.64 -17.86 37.92 -13.31 -15.97
30: -6.10 13.57 10.10 2.58 17.78 -20.40 -52.32 -11.98 4.60 9.31
31: 34.67 11.57 9.90 -19.45 25.45 -8.14 -11.17 -2.36 0.98 20.43
32: 6.44 -28.17 -33.11 -1.67 1.80 9.57 17.16 12.23 28.11 -31.20
33: -39.83 -21.81 16.50 -19.40 -32.67 -12.56 30.15 30.26 -15.94 25.89
34: 22.53 -19.20 5.16 -42.49 21.44 -37.20 -13.06 -1.46 41.72 19.54
35: -29.92 33.65 -4.05 38.64 -39.30 -24.08 19.65 -12.53 12.71 -21.53
36: -33.09 15.29 42.01 15.02 -50.95 -6.75 -14.27 -2.37 -16.87 -8.27
TEST F16_F32_ALIGNED_M m=4096 n=512 k=4096 batch=2 split_k=1 matmul 190.875ms avg_err=0.00628579
TEST F16_F32_ALIGNED_L m=4096 n=512 k=4096 batch=2 split_k=1 matmul 56.312ms avg_err=25.5248
m = 2048 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
2043: 1.69 -14.52 -13.00 -14.82 23.59 12.49 12.46 -52.33 38.27 -3.88
2044: 3.72 -1.62 -49.16 -11.58 31.73 14.40 -3.79 -8.05 -22.17 -30.98
2045: -18.31 -33.24 -4.20 -21.87 18.34 19.19 -23.53 -20.22 3.29 -31.50
2046: -35.13 37.39 -16.64 16.54 17.54 -6.74 -23.60 29.11 9.89 -33.62
2047: 21.34 -10.29 -2.96 3.24 33.38 -36.00 0.84 -6.03 -42.90 2.15
2048: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2049: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2050: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2051: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2052: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
2043: 1.69 -14.53 -13.00 -14.81 23.58 12.49 12.45 -52.33 38.27 -3.88
2044: 3.73 -1.62 -49.17 -11.58 31.73 14.39 -3.79 -8.05 -22.17 -30.97
2045: -18.31 -33.23 -4.20 -21.87 18.35 19.19 -23.54 -20.22 3.29 -31.50
2046: -35.13 37.39 -16.64 16.54 17.54 -6.74 -23.60 29.12 9.88 -33.62
2047: 21.34 -10.29 -2.95 3.24 33.38 -35.99 0.84 -6.02 -42.90 2.15
2048: -23.01 -58.01 15.43 21.13 -2.10 -45.39 27.03 -7.38 25.16 17.95
2049: -4.20 -2.36 -6.88 -18.49 -4.89 2.11 -13.01 1.60 -44.54 -20.59
2050: -1.69 25.37 -11.00 -22.18 -21.36 2.44 13.29 25.68 33.10 -10.68
2051: 5.30 1.62 -25.42 27.09 3.28 10.74 -17.05 -3.43 -43.25 -3.40
2052: -12.16 -25.62 29.13 7.28 19.41 -39.11 -2.91 11.24 4.52 -22.10
TEST F16_F32_ALIGNED_S m=4096 n=512 k=4096 batch=2 split_k=4 matmul 115.424ms avg_err=27.0087
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -7.43 4.50 -2.95 0.16 -12.23 12.53 -12.52 10.79 34.43 25.17
1: 1.80 9.85 6.27 -9.45 2.94 4.25 45.83 -14.34 -22.35 -25.55
2: -16.15 -16.51 -21.23 -44.78 12.52 9.62 -4.92 0.74 -17.33 39.71
3: -11.52 -9.63 -5.84 6.12 25.44 -24.54 24.64 31.04 18.32 -5.26
4: 8.17 -2.69 13.01 -28.12 55.82 -18.45 -0.96 0.73 7.09 5.39
5: 28.19 2.23 -4.13 -9.22 -5.65 45.14 34.69 -26.58 6.63 42.24
6: 5.28 48.09 -49.25 17.45 -12.31 -15.27 -17.57 -5.76 -42.36 -5.07
7: 20.70 -1.06 12.34 -10.55 10.92 -32.03 4.82 -21.35 12.78 11.62
8: -44.08 4.14 -8.76 21.95 20.84 -2.08 3.35 7.18 -15.11 22.60
9: -5.35 -17.81 -0.54 10.18 6.72 2.14 -22.48 -5.44 14.77 -12.99
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -7.34 2.42 -2.51 0.63 -17.58 12.15 -12.20 9.55 30.79 25.18
1: 1.80 9.85 6.27 -9.46 2.93 4.24 45.83 -14.34 -22.35 -25.55
2: -12.66 -15.18 -19.95 -40.96 13.18 12.47 -5.72 2.48 -16.96 40.65
3: -11.52 -9.64 -5.84 6.12 25.45 -24.53 24.64 31.04 18.32 -5.26
4: 5.72 -6.57 16.74 -25.68 55.32 -19.22 0.37 0.62 9.37 7.81
5: 28.19 2.22 -4.12 -9.23 -5.65 45.13 34.69 -26.58 6.63 42.25
6: 6.43 46.45 -47.50 18.17 -11.52 -11.39 -16.97 -6.63 -41.08 -3.35
7: 20.70 -1.06 12.34 -10.54 10.92 -32.03 4.83 -21.36 12.78 11.62
8: -47.19 4.66 -12.56 24.57 20.60 -2.68 3.24 9.96 -13.24 20.80
9: -5.35 -17.80 -0.55 10.17 6.72 2.14 -22.48 -5.44 14.76 -13.00
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 7.85 0.11 -17.41 -6.55 -2.17 6.28 -0.49 -4.68 -11.43 13.49
1: 10.00 8.90 26.75 9.82 -12.51 -2.07 8.68 -7.66 5.53 -3.00
2: 12.50 -16.44 -1.05 -23.40 12.72 0.33 2.52 -1.88 -8.15 -5.99
3: 10.59 -1.52 1.50 16.81 22.37 -12.33 -2.76 -0.30 -16.63 -17.67
4: 7.51 -2.01 -4.39 -2.74 6.61 4.67 2.40 -18.57 -5.03 -12.94
5: 22.44 9.24 6.39 7.30 -18.76 7.88 -1.74 4.51 1.21 0.79
6: -4.00 5.84 -11.73 -6.11 -14.29 2.68 -20.31 10.03 -6.13 2.79
7: -14.80 7.98 0.34 -11.89 -11.13 -8.50 -8.45 -13.14 1.11 7.07
8: -15.03 -4.22 -1.92 18.47 15.64 -2.52 -1.36 3.70 -8.00 -16.67
9: 1.56 -4.18 12.79 -0.49 0.96 8.52 -6.40 -5.51 6.59 8.52
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 1.58 -17.35 -1.35 2.76 -3.79 18.37 8.57 10.09 13.31 -2.36
1: 17.87 9.42 -4.43 -28.84 19.95 2.51 18.38 -6.39 -1.40 -7.59
2: -15.40 -2.67 -1.18 -8.43 0.16 10.06 6.35 -10.81 -1.02 14.98
3: -3.92 -8.72 -1.15 -10.99 -2.76 -2.04 18.94 15.36 10.94 9.40
4: 15.63 13.05 8.34 -11.51 12.85 -4.67 23.08 -7.17 7.59 -10.66
5: -9.11 2.38 -4.64 -7.44 7.08 12.38 -0.57 -19.14 -5.39 19.50
6: -9.77 4.80 -16.55 9.56 17.30 -4.91 -9.10 2.07 -17.57 -12.42
7: -3.43 -4.86 -20.45 -7.76 6.22 -0.13 -0.37 -14.20 2.01 18.37
8: -13.67 -18.12 -0.13 12.77 -4.49 1.31 -1.94 12.94 14.32 -2.55
9: 6.47 0.30 -19.95 12.00 -12.71 -6.46 4.01 -4.99 5.51 -12.73
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -15.38 11.10 4.45 -2.94 -10.13 -7.18 -8.01 2.76 14.01 -1.16
1: -5.95 -18.07 -6.20 -5.05 1.78 9.71 8.51 -12.47 -15.38 1.04
2: 10.21 2.16 -16.20 -7.43 9.74 -6.65 -10.23 6.71 -2.69 10.46
3: -14.93 19.20 7.83 11.56 -2.32 -14.99 -4.29 -0.20 3.88 -16.48
4: -21.98 -6.74 2.38 -28.70 22.95 -16.03 -12.35 17.31 -4.89 12.82
5: 17.28 1.32 -3.19 3.70 1.64 12.91 20.87 -7.33 -3.26 4.24
6: 12.24 16.17 -17.21 14.36 -4.72 3.29 0.69 -11.56 -3.61 2.54
7: 2.44 7.11 9.04 0.24 -3.80 -19.40 10.15 10.55 6.83 2.88
8: 0.87 11.29 -1.62 1.64 -3.47 -6.49 9.65 2.04 -13.12 17.73
9: -8.44 0.09 -6.55 4.37 5.87 -13.39 -10.93 9.14 -4.85 -8.57
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -1.48 10.64 11.35 6.89 3.85 -4.94 -12.59 2.62 18.55 15.21
1: -20.11 9.59 -9.84 14.62 -6.28 -5.90 10.26 12.19 -11.10 -15.99
2: -23.47 0.44 -2.79 -5.52 -10.09 5.88 -3.56 6.72 -5.47 20.26
3: -3.27 -18.60 -14.02 -11.26 8.15 4.83 12.75 16.17 20.14 19.48
4: 7.02 -6.99 6.68 14.82 13.41 -2.42 -14.08 9.15 9.42 16.17
5: -2.43 -10.71 -2.68 -12.78 4.38 11.96 16.12 -4.62 14.07 17.71
6: 6.81 21.28 -3.77 -0.35 -10.61 -16.33 11.15 -6.30 -15.05 2.02
7: 36.48 -11.29 23.41 8.85 19.63 -3.99 3.49 -4.56 2.83 -16.71
8: -16.25 15.19 -5.09 -10.94 13.17 5.62 -3.00 -11.50 -8.30 24.10
9: -4.94 -14.02 13.16 -5.71 12.59 13.48 -9.16 -4.08 7.50 -0.21
TEST F16_F32_ALIGNED_M m=4096 n=512 k=4096 batch=2 split_k=4 matmul 189.305ms avg_err=0.0062824
TEST F16_F32_ALIGNED_L m=4096 n=512 k=4096 batch=2 split_k=4 matmul 57.879ms avg_err=41.0685
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -37.37 1.98 34.75 -29.59 31.60 5.07 26.71 9.90 33.22 -24.34
1: 0.77 -20.44 15.50 15.68 33.20 -36.67 -1.52 -3.61 -5.24 4.35
2: -8.16 -36.63 26.84 -26.12 0.25 -4.44 29.50 -28.61 3.68 -25.52
3: 13.06 7.88 39.58 -3.59 11.42 -3.44 -9.31 -31.57 -42.64 54.17
4: -0.23 -31.68 1.22 11.16 -10.59 -1.46 5.33 -2.83 27.15 19.19
5: -22.01 -45.90 -6.63 21.17 -39.48 -9.94 -6.27 -8.54 18.25 16.61
6: -0.52 -10.46 15.21 -17.78 18.73 -4.57 -8.27 29.39 -2.81 -9.16
7: -4.59 -1.34 -5.66 -39.55 9.08 14.12 31.96 -22.67 -8.11 14.98
8: -19.93 8.20 8.19 9.72 8.09 -8.53 -24.27 -0.31 -34.87 18.49
9: -41.47 47.38 -7.77 25.87 -6.54 23.69 -0.73 22.61 -33.28 -4.29
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -49.34 12.46 -9.41 3.90 6.86 16.88 13.91 -12.27 32.16 -60.09
1: -3.26 -30.26 -0.44 9.92 12.67 -25.42 -24.54 -34.72 7.89 29.12
2: -7.88 -18.63 -8.01 -13.35 5.99 3.74 28.01 4.88 -8.18 8.15
3: -11.88 -5.67 12.41 -10.78 20.84 -48.15 -18.63 -16.78 -42.24 -11.04
4: -26.99 -12.10 -4.58 2.52 20.89 -11.56 -19.58 7.79 18.08 -28.93
5: -7.60 -17.37 12.76 6.45 -24.71 -8.68 -22.40 11.58 53.71 14.82
6: 12.52 -46.53 -19.95 5.68 13.37 -18.55 -20.93 4.32 13.19 -10.18
7: 23.51 -17.38 -14.00 -5.57 -25.07 31.74 50.64 6.13 -19.76 -26.42
8: -46.36 36.31 -27.85 18.79 2.82 -20.28 -8.05 30.48 -19.36 16.85
9: -30.65 16.69 -4.96 -9.65 20.03 17.44 26.44 35.88 -3.26 8.43
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -7.26 5.22 0.96 2.51 12.57 3.03 16.20 13.41 7.08 -5.28
1: -1.49 -1.97 5.73 -3.48 8.13 -2.08 -4.93 -4.52 0.42 8.70
2: -8.39 -15.95 22.58 -0.40 3.78 15.24 -0.55 -8.08 2.33 2.57
3: 9.49 5.94 -3.80 6.18 14.19 -5.79 3.10 2.18 -11.41 23.70
4: -1.55 -15.90 6.43 20.64 -6.16 11.13 -7.10 12.54 -1.78 7.69
5: -1.21 -16.55 11.06 -7.19 -1.27 -7.08 -7.35 18.30 -0.28 7.20
6: 15.21 -0.52 10.09 6.34 7.19 1.57 -5.29 0.47 5.50 -3.71
7: 1.84 6.55 -13.97 -16.34 0.26 -5.22 9.95 -4.33 0.81 -4.00
8: -0.11 19.18 -6.14 -1.48 12.87 -13.47 -1.64 -0.90 -23.42 -1.71
9: -7.31 6.43 4.03 11.38 2.76 17.50 13.84 19.14 -10.44 -5.56
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -27.70 -4.79 13.39 2.57 2.00 0.44 0.41 -14.86 3.70 -15.47
1: 2.82 -7.85 9.43 8.81 9.18 -13.76 1.24 -3.43 -6.71 13.09
2: 8.84 -18.86 -4.83 -7.25 8.91 -10.91 9.65 -6.49 -1.50 5.26
3: -11.74 -16.68 22.21 -8.41 4.97 -16.37 0.62 -10.91 0.02 2.23
4: 4.86 1.94 -6.44 -1.48 1.96 -6.43 -1.27 -8.63 14.42 -9.72
5: 0.34 -6.28 -6.87 8.14 -5.11 -3.22 7.94 -16.23 3.07 5.78
6: -0.32 -3.00 -5.15 -20.40 -7.88 -8.01 -7.72 -4.06 -3.69 -7.11
7: -1.92 -12.89 -7.09 -3.57 -7.74 19.60 19.93 1.82 -10.48 -1.04
8: -33.35 2.74 7.18 3.50 -14.69 -2.69 -2.84 3.23 -11.56 27.21
9: -19.20 15.16 -14.13 0.33 15.23 -8.20 2.84 -4.95 -9.65 14.67
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -11.76 2.72 -7.60 -18.29 8.59 5.20 8.54 7.74 11.16 12.57
1: -8.46 11.24 -9.61 4.93 12.37 -17.81 -9.80 12.50 -14.74 -12.52
2: -5.88 -7.00 2.50 -6.00 -5.86 -19.78 14.73 -12.65 -1.64 -14.19
3: -10.36 15.80 11.84 -13.32 -4.47 14.85 2.16 -5.59 -11.10 12.13
4: 4.00 -6.72 7.22 0.02 0.29 4.30 -7.79 2.03 2.74 21.62
5: -14.47 -21.37 -1.20 9.30 -5.90 -16.43 3.93 1.40 3.81 -5.10
6: 2.75 -21.00 15.00 -2.51 10.80 -1.08 -6.54 27.44 9.43 -3.53
7: -1.82 15.13 8.46 -14.83 9.95 -11.54 -11.37 -3.17 1.21 5.58
8: 7.23 -6.66 10.88 -0.04 -8.72 -6.08 -5.89 5.04 -2.20 -19.85
9: 1.32 12.86 0.86 14.37 -22.07 10.65 -8.09 -3.10 -6.97 -12.54
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 9.35 -1.17 28.00 -16.37 8.44 -3.60 1.56 3.61 11.28 -16.16
1: 7.91 -21.86 9.95 5.42 3.52 -3.02 11.97 -8.16 15.79 -4.91
2: -2.72 5.18 6.58 -12.47 -6.58 11.01 5.68 -1.40 4.49 -19.17
3: 25.67 2.82 9.32 11.96 -3.27 3.87 -15.19 -17.24 -20.15 16.12
4: -7.55 -11.00 -6.00 -8.02 -6.69 -10.44 21.49 -8.76 11.77 -0.39
5: -6.66 -1.70 -9.61 10.92 -27.20 16.80 -10.79 -12.00 11.64 8.72
6: -18.16 14.06 -4.74 -1.21 8.62 2.94 11.28 5.54 -14.05 5.20
7: -2.70 -10.13 6.94 -4.81 6.61 11.28 13.45 -16.99 0.35 14.44
8: 6.29 -7.06 -3.72 7.74 18.62 13.72 -13.90 -7.68 2.31 12.84
9: -16.28 12.93 1.47 -0.21 -2.46 3.74 -9.32 11.52 -6.23 -0.85
TEST F16_F32_ALIGNED_S m=11008 n=512 k=4096 batch=2 split_k=1 matmul 336.024ms avg_err=25.5207
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -17.58 -15.54 -26.73 38.75 -15.11 -5.07 -27.50 -22.64 -1.33 -3.66
28: 17.43 -14.85 1.21 17.35 18.71 -14.17 -33.17 -13.56 35.10 -17.25
29: -21.51 16.79 10.30 21.88 -28.49 -0.14 -22.86 16.57 13.73 -5.64
30: 27.64 -4.82 -5.55 -21.72 38.65 -31.58 3.07 -5.34 31.17 30.86
31: -3.30 20.28 6.26 -69.53 32.58 -36.94 -6.23 -8.53 -2.92 -13.75
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -17.58 -15.54 -26.73 38.75 -15.11 -5.07 -27.50 -22.64 -1.33 -3.66
28: 17.44 -14.85 1.20 17.35 18.72 -14.17 -33.17 -13.56 35.10 -17.25
29: -21.51 16.79 10.30 21.88 -28.49 -0.14 -22.86 16.57 13.74 -5.65
30: 27.65 -4.82 -5.54 -21.72 38.65 -31.58 3.06 -5.34 31.17 30.86
31: -3.30 20.28 6.26 -69.53 32.58 -36.94 -6.23 -8.53 -2.92 -13.75
32: -51.23 8.63 0.89 -4.87 7.17 26.58 -1.03 14.26 -3.68 3.18
33: 8.98 13.94 32.58 47.51 -6.88 -18.67 26.66 -10.42 42.98 20.92
34: -41.67 2.11 12.47 -5.37 -16.39 -0.08 -17.39 -18.23 23.04 -4.64
35: -8.44 2.96 -3.15 27.86 26.11 -29.58 -18.11 -6.36 5.10 26.93
36: -0.18 -9.90 8.23 -9.53 -7.16 -6.86 4.93 -4.44 -15.57 -10.54
TEST F16_F32_ALIGNED_M m=11008 n=512 k=4096 batch=2 split_k=1 matmul 481.223ms avg_err=0.00628346
TEST F16_F32_ALIGNED_L m=11008 n=512 k=4096 batch=2 split_k=1 matmul 134.011ms avg_err=25.534
m = 5504 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
5499: 29.61 22.16 -15.52 26.95 -0.77 1.06 -10.34 39.88 -17.11 1.58
5500: 14.83 1.31 -0.00 3.78 -7.08 -19.75 -1.60 18.33 35.56 17.82
5501: -9.95 -18.25 10.95 -35.72 2.40 -35.04 -51.48 1.63 1.82 -18.30
5502: 54.16 8.30 -24.20 -5.98 17.42 -30.52 0.01 -1.80 -11.39 -23.73
5503: -24.68 -10.00 13.95 1.34 -15.94 42.72 -11.81 22.54 7.41 26.28
5504: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5505: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5506: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5507: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5508: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
5499: 29.60 22.16 -15.52 26.95 -0.78 1.06 -10.34 39.87 -17.10 1.59
5500: 14.83 1.31 -0.00 3.78 -7.08 -19.75 -1.59 18.32 35.56 17.82
5501: -9.95 -18.26 10.95 -35.72 2.40 -35.04 -51.48 1.63 1.82 -18.30
5502: 54.16 8.30 -24.21 -5.99 17.42 -30.51 0.02 -1.80 -11.39 -23.73
5503: -24.69 -10.00 13.94 1.35 -15.94 42.73 -11.81 22.54 7.41 26.28
5504: 18.33 29.67 -31.35 30.36 -10.29 11.59 -11.94 -44.59 -7.08 1.05
5505: -22.16 16.09 -17.69 -19.47 -26.16 -24.22 -7.03 -1.61 -15.63 -31.07
5506: 24.26 -5.60 -18.87 -28.16 30.87 12.76 -10.44 -15.23 52.65 21.29
5507: 22.45 13.02 20.58 52.33 19.78 -22.41 4.22 -7.14 -8.99 -12.50
5508: 32.97 19.62 -6.10 -20.36 -15.97 4.74 20.97 3.77 -14.86 29.28
TEST F16_F32_ALIGNED_S m=11008 n=512 k=4096 batch=2 split_k=4 matmul 301.634ms avg_err=26.97
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 33.97 3.56 1.50 -10.61 -4.70 -19.71 -15.30 -0.02 -52.12 16.34
28: -13.68 -17.05 22.37 20.97 -5.76 18.94 -7.84 31.32 -1.64 24.00
29: -3.01 7.68 -1.98 -16.23 10.48 -21.06 7.92 16.35 3.00 2.39
30: -12.57 -6.51 -15.89 10.31 23.48 7.67 -29.15 44.06 -43.50 23.04
31: 7.10 -21.08 -24.84 38.13 7.10 1.42 -37.37 -10.90 0.16 53.25
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 33.97 3.56 1.50 -10.61 -4.71 -19.71 -15.30 -0.02 -52.12 16.34
28: -13.68 -17.06 22.37 20.97 -5.76 18.95 -7.84 31.32 -1.64 24.00
29: -3.01 7.69 -1.99 -16.22 10.48 -21.06 7.92 16.35 2.99 2.39
30: -12.56 -6.52 -15.89 10.31 23.48 7.67 -29.16 44.07 -43.50 23.05
31: 7.10 -21.08 -24.84 38.13 7.11 1.43 -37.38 -10.89 0.16 53.25
32: -3.57 -1.84 -26.30 3.49 -6.06 -15.97 -1.68 6.16 0.22 25.56
33: 20.18 -23.59 -3.12 -28.86 46.89 2.09 41.53 29.79 -17.07 -22.04
34: -23.42 -9.30 -3.57 22.71 1.90 -9.68 -7.35 -23.74 41.31 -2.84
35: 5.34 21.65 -17.75 -11.38 8.57 13.51 1.64 -35.00 -30.95 -9.91
36: 4.43 29.71 -6.79 14.12 13.79 2.49 30.36 -0.46 -24.35 2.49
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 5.65 -7.02 -1.60 -2.90 10.88 -8.33 -17.10 -4.86 -11.06 -1.39
28: -24.63 -9.84 15.74 22.47 0.83 7.32 -5.42 -2.28 11.60 8.19
29: -1.94 7.85 -8.49 -17.00 5.65 -16.13 11.40 4.48 -13.24 7.77
30: -1.22 4.34 -4.57 -5.61 -13.88 -8.48 5.94 15.45 -6.07 5.95
31: 1.04 5.02 3.00 16.30 7.26 -12.55 -19.16 7.71 -3.88 27.41
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: 9.92 2.28 5.06 6.19 2.76 -4.19 6.37 8.47 -17.62 4.72
28: -5.19 -8.63 -2.86 4.53 -6.79 2.21 -2.51 15.82 -2.91 -4.10
29: 0.54 -11.56 -3.38 -14.96 -8.88 -13.52 0.21 -15.60 5.80 -7.93
30: -3.33 -3.10 3.65 -2.99 4.18 27.18 -2.73 5.13 -22.35 -0.25
31: 5.73 -12.72 -19.59 20.87 -21.61 -9.90 8.85 -26.80 1.70 4.64
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: -5.61 -0.54 -0.02 -6.81 -5.62 1.63 11.03 -15.24 -15.50 1.18
28: 18.54 5.11 7.68 -4.09 9.75 4.67 0.58 13.78 -8.22 8.98
29: 13.59 1.40 -6.31 2.37 15.65 -2.94 -2.98 17.81 14.99 10.16
30: -11.90 -6.65 -6.83 3.83 11.56 -5.49 -14.79 17.62 -7.59 9.23
31: 6.39 -10.63 -12.55 11.66 -3.46 2.69 -19.31 3.46 0.05 11.67
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 24.00 8.84 -1.95 -7.09 -12.73 -8.82 -15.61 11.61 -7.95 11.83
28: -2.40 -3.69 1.82 -1.94 -9.55 4.74 -0.49 4.00 -2.10 10.92
29: -15.21 10.01 16.20 13.37 -1.94 11.53 -0.71 9.66 -4.55 -7.60
30: 3.89 -1.10 -8.14 15.09 21.63 -5.54 -17.57 5.86 -7.48 8.10
31: -6.07 -2.75 4.30 -10.70 24.91 21.18 -7.75 4.73 2.30 9.54
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_ALIGNED_M m=11008 n=512 k=4096 batch=2 split_k=4 matmul 483.874ms avg_err=0.00627963
TEST F16_F32_ALIGNED_L m=11008 n=512 k=4096 batch=2 split_k=4 matmul 134.981ms avg_err=41.0963
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 17.37 11.66 -33.98 -3.71 24.55 5.90 -16.16 11.99 -75.69 9.59
1: -12.21 -15.12 -27.55 14.83 19.96 2.49 12.98 0.27 15.27 19.89
2: -2.40 30.51 -16.21 -18.59 13.27 7.59 -19.80 3.36 35.51 -0.01
3: 9.46 23.16 -26.74 54.17 -36.36 -16.69 18.40 11.34 -0.82 28.35
4: 14.83 5.74 -7.42 13.64 39.29 -9.74 5.71 19.73 -1.13 1.99
5: 13.95 5.11 2.60 -57.61 -7.40 0.55 -16.69 12.78 30.02 26.36
6: -15.93 -10.48 -25.32 -29.80 27.42 39.16 0.68 15.06 17.45 23.38
7: 11.75 -29.70 -14.73 28.60 -5.90 -13.82 -33.64 1.65 2.21 17.20
8: -18.44 -26.54 -11.60 15.51 -9.97 -57.06 14.96 -12.69 -8.87 -6.18
9: -56.60 13.60 26.71 -0.29 6.11 34.06 8.18 -21.42 -14.08 37.11
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 29.30 -3.75 -15.32 -47.75 9.19 -5.02 -35.95 -25.19 11.20 3.47
1: -16.41 -37.21 -31.86 29.93 15.69 30.43 22.03 -6.29 34.22 49.09
2: -15.30 48.65 -11.79 -17.75 3.24 -2.07 -20.83 -0.35 17.00 -16.51
3: 1.90 13.27 -6.17 15.28 -22.50 -20.12 9.46 27.11 24.15 35.89
4: -1.48 -29.64 -40.31 0.44 11.10 -22.75 -9.62 13.49 19.31 7.42
5: -14.51 31.25 14.02 -69.62 1.20 -12.08 25.28 -5.05 -1.95 57.52
6: -35.95 -14.42 -9.70 -0.07 41.36 7.26 8.48 19.99 7.64 31.98
7: -5.25 -0.47 -12.65 21.79 4.41 -6.08 -40.24 -18.54 -2.78 -9.74
8: -3.28 -11.16 18.37 -5.37 -5.79 0.95 -0.06 -9.21 -29.05 8.20
9: -13.15 15.37 -1.82 19.20 -13.92 30.08 -12.33 -12.88 -3.78 12.65
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 1.92 -19.72 7.59 -14.27 -6.64 20.83 -6.89 7.66 0.30 9.70
1: -0.69 -18.98 -5.68 11.67 -0.38 3.08 16.45 12.50 2.64 11.16
2: -11.25 16.53 7.14 -3.03 7.91 4.32 10.86 -2.78 3.62 1.29
3: 2.42 2.04 -6.17 3.86 -3.06 -7.91 -4.71 -6.26 7.49 -1.67
4: 17.75 -8.96 -23.25 9.46 10.95 -11.75 10.73 -5.97 6.73 2.34
5: 1.95 4.72 -8.56 -23.10 -2.14 1.61 5.19 12.49 6.72 8.59
6: -9.40 -0.85 -5.34 -11.25 16.13 1.92 -3.83 -1.83 15.59 11.36
7: 0.79 -9.40 6.43 1.24 0.16 3.84 -9.57 -16.20 -7.55 22.53
8: -0.08 -16.98 11.68 8.12 -12.14 -28.11 1.44 -9.11 -28.12 -2.11
9: -13.27 7.69 1.13 -0.90 2.85 2.41 4.25 -19.17 -0.63 6.86
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 7.72 17.46 -20.12 -4.24 10.34 -17.18 -25.96 -9.38 -13.10 1.74
1: 7.40 12.23 -9.69 11.85 7.57 6.39 -11.29 -0.56 12.08 2.53
2: -0.16 6.38 -6.58 -10.77 -2.11 -16.63 -20.04 5.37 11.40 -18.04
3: -1.71 3.10 4.50 27.18 -26.05 -6.90 8.78 31.88 -13.97 23.12
4: -12.52 -4.41 -5.57 4.77 1.88 -4.31 -3.21 4.54 -0.67 9.95
5: -3.07 13.61 12.82 -27.86 7.68 -15.42 -2.50 -7.16 -12.53 24.78
6: -13.49 -7.89 2.19 -5.11 1.31 18.96 5.25 4.18 -5.98 7.27
7: 4.12 -2.06 -12.24 14.33 0.71 -7.67 -10.19 6.36 0.10 -8.17
8: -9.52 14.11 -10.71 -3.45 15.34 -0.58 3.65 -8.22 13.31 -1.14
9: 4.30 4.42 1.62 11.60 -1.81 23.93 -8.05 -10.46 1.96 3.42
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 10.23 11.88 -16.97 5.58 35.26 1.17 -1.89 1.28 -25.85 4.45
1: -10.50 4.64 -9.04 -16.15 7.25 -15.19 -8.09 14.65 4.99 1.97
2: 3.42 14.74 2.80 -11.42 4.43 13.83 3.71 0.35 7.64 15.62
3: -3.81 5.48 -12.47 16.41 -9.32 6.67 5.83 -10.59 -2.10 2.90
4: 6.24 9.95 8.96 5.03 13.06 -6.34 -4.19 11.38 -4.58 -6.00
5: 9.05 -3.99 12.77 -22.70 -15.60 -1.03 -3.76 -0.77 12.79 -8.44
6: 8.98 -11.48 -22.61 -17.16 7.94 21.23 2.27 6.90 -0.49 1.34
7: -5.44 9.29 -8.97 -3.00 6.81 15.22 -8.30 -11.38 2.28 5.13
8: -12.21 -6.69 3.66 4.74 -6.87 -23.16 -0.78 -4.03 4.10 10.96
9: -37.09 4.85 10.90 5.83 6.03 -8.59 6.21 13.30 -9.27 -2.60
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: -2.50 2.03 -4.49 9.22 -14.42 1.07 18.58 12.43 -37.04 -6.30
1: -8.41 -13.01 -3.14 7.46 5.52 8.21 15.92 -26.32 -4.43 4.24
2: 5.59 -7.14 -19.57 6.63 3.03 6.09 -14.33 0.42 12.84 1.12
3: 12.56 12.54 -12.60 6.71 2.07 -8.55 8.50 -3.69 7.76 3.99
4: 3.36 9.16 12.44 -5.64 13.41 12.66 2.38 9.78 -2.62 -4.29
5: 6.01 -9.24 -14.42 16.05 2.66 15.40 -15.62 8.22 23.05 1.43
6: -2.02 9.74 0.44 3.72 2.04 -2.96 -3.00 5.79 8.33 3.41
7: 12.29 -27.52 0.05 16.03 -13.59 -25.21 -5.59 22.88 7.39 -2.29
8: 3.36 -16.98 -16.23 6.10 -6.30 -5.22 10.65 8.67 1.84 -13.89
9: -10.55 -3.37 13.06 -16.81 -0.97 16.31 5.77 -5.08 -6.13 29.43
TEST F16_F32_ALIGNED_S m=4096 n=512 k=11008 batch=2 split_k=1 matmul 1100.57ms avg_err=41.8438
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -77.02 42.00 -13.33 -92.94 32.28 9.21 19.22 -6.65 56.53 21.64
28: -44.70 -14.72 -4.53 -34.36 29.23 -99.29 -19.11 14.05 -49.23 11.40
29: -28.85 -24.85 -24.76 -18.03 -41.72 -16.77 28.04 53.03 5.65 60.96
30: 26.79 8.38 4.87 -4.47 22.68 8.73 -44.66 84.93 0.03 39.87
31: -15.00 3.50 -4.74 42.75 -26.66 33.87 -7.67 -57.65 66.00 32.53
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -77.01 42.00 -13.34 -92.94 32.29 9.20 19.22 -6.65 56.53 21.63
28: -44.71 -14.72 -4.54 -34.35 29.23 -99.29 -19.10 14.05 -49.23 11.41
29: -28.85 -24.85 -24.77 -18.04 -41.71 -16.77 28.05 53.04 5.65 60.95
30: 26.79 8.38 4.87 -4.47 22.68 8.73 -44.66 84.94 0.03 39.87
31: -15.00 3.51 -4.74 42.75 -26.67 33.87 -7.67 -57.65 66.00 32.52
32: -6.45 25.23 65.64 27.83 -17.72 45.79 61.87 -46.92 -23.63 11.42
33: 43.42 9.26 13.51 13.88 43.51 -2.76 7.01 -40.19 -13.36 1.06
34: 22.29 -10.35 12.63 4.14 -42.67 -18.31 26.67 -55.37 21.13 22.77
35: 31.87 -26.76 -29.89 52.03 23.69 -22.84 -1.13 -41.96 16.00 -49.37
36: -13.21 63.64 -8.63 -63.45 -29.85 -2.63 -34.05 10.90 -10.70 -10.50
TEST F16_F32_ALIGNED_M m=4096 n=512 k=11008 batch=2 split_k=1 matmul 484.733ms avg_err=0.0103033
TEST F16_F32_ALIGNED_L m=4096 n=512 k=11008 batch=2 split_k=1 matmul 128.739ms avg_err=41.855
m = 2048 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
2043: 23.85 3.96 55.39 -29.81 -50.28 76.14 88.73 22.13 54.23 108.86
2044: 9.10 -36.72 -35.71 -3.12 7.69 17.71 -4.06 -27.84 13.40 -10.23
2045: 14.55 -12.54 -0.61 45.35 -21.18 38.82 19.56 -5.07 12.60 -25.19
2046: -46.06 -32.53 -21.54 59.30 -14.59 11.91 27.53 -14.86 -41.52 10.81
2047: 7.94 19.34 15.13 6.69 -26.33 53.72 -19.65 -0.06 19.52 24.12
2048: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2049: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2050: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2051: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2052: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
2043: 23.86 3.95 55.37 -29.81 -50.29 76.13 88.73 22.12 54.22 108.86
2044: 9.09 -36.72 -35.71 -3.13 7.69 17.72 -4.06 -27.84 13.40 -10.23
2045: 14.55 -12.54 -0.60 45.34 -21.18 38.82 19.56 -5.06 12.60 -25.19
2046: -46.05 -32.52 -21.54 59.30 -14.60 11.90 27.53 -14.86 -41.51 10.80
2047: 7.94 19.34 15.12 6.68 -26.32 53.74 -19.66 -0.07 19.53 24.13
2048: 4.98 -0.94 31.66 -8.46 -54.22 17.02 -34.55 -2.44 -2.42 8.39
2049: 6.04 -22.28 36.28 7.74 10.50 18.84 -23.21 9.16 2.42 19.39
2050: 37.04 27.26 -21.13 10.91 21.25 -28.28 -8.95 -21.46 27.23 -3.98
2051: 20.35 67.76 11.16 -5.83 -42.70 14.15 25.65 -15.08 18.51 -24.93
2052: -26.68 -16.44 10.38 -14.86 29.60 15.80 -27.96 10.82 -50.32 -31.16
TEST F16_F32_ALIGNED_S m=4096 n=512 k=11008 batch=2 split_k=4 matmul 895.772ms avg_err=49.0524
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: 43.59 -51.28 63.90 70.42 -20.70 -0.07 3.63 -67.48 -29.99 -11.22
28: 42.55 -1.88 37.41 -45.14 37.25 34.41 5.78 62.84 24.17 27.02
29: -57.12 -14.05 20.30 30.34 -1.62 -42.98 23.59 7.67 -58.28 -68.73
30: 23.78 -37.88 89.65 8.08 -37.74 -28.54 -17.93 -31.30 -16.29 -18.52
31: -37.94 33.41 -20.85 -2.17 13.44 61.38 19.40 -18.19 86.08 -77.06
32: -9.63 31.94 6.73 2.05 25.95 14.92 -13.06 -0.23 -1.13 0.18
33: -5.91 3.97 -2.93 3.96 22.48 9.46 -19.66 1.09 1.51 8.34
34: 7.22 2.40 0.76 13.33 -6.64 22.68 -9.57 5.30 -22.85 -32.89
35: 14.48 42.12 18.10 21.22 -15.08 -43.39 -46.62 32.12 -6.47 -1.20
36: 8.01 -11.70 8.37 29.01 -35.64 -10.14 -5.57 10.86 19.74 -20.50
Expected result:
0 1 2 3 4 5 6 7 8 9
27: 43.59 -51.27 63.89 70.43 -20.70 -0.07 3.63 -67.48 -29.98 -11.21
28: 42.55 -1.88 37.41 -45.15 37.25 34.42 5.77 62.83 24.18 27.01
29: -57.12 -14.06 20.31 30.33 -1.62 -42.98 23.60 7.67 -58.28 -68.74
30: 23.77 -37.87 89.64 8.08 -37.74 -28.54 -17.92 -31.30 -16.28 -18.52
31: -37.94 33.41 -20.84 -2.18 13.44 61.38 19.40 -18.18 86.10 -77.06
32: -23.80 -38.51 15.00 0.21 -27.87 -59.11 7.77 -35.64 -24.20 -74.60
33: -20.21 -0.24 50.40 -60.81 -22.96 63.82 27.86 38.77 -54.10 -40.92
34: -17.34 -45.32 15.08 -10.65 4.41 -31.51 3.66 28.15 -68.49 -28.78
35: -61.83 -33.92 52.51 -35.40 96.83 29.28 21.54 -32.26 31.80 -43.53
36: 62.13 -19.08 -11.79 54.49 -21.41 -74.52 91.95 8.75 -78.59 -66.92
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 1.18 -16.31 12.00 17.32 29.94 -17.78 21.71 19.15 8.04 -8.47
28: -10.95 -15.19 -1.93 -12.72 36.33 6.33 -4.08 5.31 -15.70 -1.76
29: -1.65 18.03 10.64 -5.39 16.99 -5.48 39.88 38.49 -6.66 -20.53
30: -11.37 24.34 13.55 0.39 -8.29 8.54 12.27 -3.24 1.57 -12.21
31: -8.75 -4.94 2.23 -6.20 -0.04 20.97 -12.36 -4.32 29.85 -30.03
32: -1.71 9.91 -1.11 12.68 21.98 -2.11 6.96 -2.76 -6.64 0.66
33: -5.14 3.26 -20.86 -1.74 27.22 -16.78 -1.79 -5.70 1.93 -7.19
34: -7.68 9.76 -2.49 14.82 15.82 8.16 -10.13 -16.20 -8.95 2.82
35: 10.22 12.36 9.81 15.24 -6.39 -3.43 -7.97 14.19 -3.58 2.30
36: 1.67 4.90 13.33 10.27 -16.29 -1.03 8.44 -1.00 -3.67 -10.60
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -6.44 -16.60 21.13 21.20 -15.43 21.87 5.46 -35.76 -8.85 2.99
28: 22.24 36.13 8.60 -13.19 6.76 0.07 18.69 21.32 -12.32 -7.73
29: -27.82 -0.45 3.56 6.38 -1.68 1.97 6.80 -24.54 -23.84 -26.10
30: 27.55 -35.52 -0.71 -14.41 -2.15 -23.42 -16.25 -5.37 13.69 3.61
31: -23.20 -11.20 20.14 -12.32 2.52 21.18 17.19 8.04 11.78 -5.58
32: -3.54 15.47 10.94 -0.41 2.38 3.73 -6.41 2.44 -21.66 -0.28
33: 1.19 -4.31 16.58 -0.11 -2.85 3.94 2.44 -2.75 1.30 -13.66
34: -12.70 -0.13 -4.76 -8.15 -3.12 2.67 5.03 4.17 11.42 -23.02
35: -6.49 13.36 6.45 10.17 -9.95 -28.07 -14.41 7.45 6.98 4.23
36: 0.01 9.44 14.77 -10.72 -10.62 -8.54 -2.16 5.17 -3.04 -14.59
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 10.98 10.17 26.97 -2.63 -14.38 9.45 12.39 -17.38 -17.41 -0.16
28: 29.33 8.16 22.50 -7.11 -7.87 4.45 -3.14 24.88 46.36 16.39
29: -7.24 -37.29 20.76 8.09 -1.76 -20.24 -15.95 -0.20 -9.85 0.99
30: -11.52 -14.87 49.66 17.34 -36.02 -11.30 -14.50 -34.78 -6.64 4.59
31: -12.69 32.78 -42.64 -7.61 11.58 7.22 32.71 -7.48 14.93 -15.71
32: -2.41 8.75 -7.44 -5.70 11.01 1.57 -17.52 -5.48 14.16 -5.08
33: -1.21 2.82 -2.71 3.11 0.05 9.36 -9.07 9.16 2.08 13.28
34: 22.25 -6.72 17.96 -1.92 -13.15 2.80 12.34 0.50 -13.61 0.90
35: 1.40 14.59 1.19 7.06 -15.46 0.43 6.28 6.03 -3.62 -4.61
36: 4.43 -12.10 -13.80 8.18 -7.94 -3.27 10.08 9.38 24.23 -3.84
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: 37.87 -28.54 3.80 34.53 -20.84 -13.60 -35.93 -33.48 -11.76 -5.58
28: 1.92 -30.98 8.23 -12.13 2.03 23.56 -5.68 11.34 5.84 20.13
29: -20.41 5.66 -14.66 21.26 -15.17 -19.23 -7.14 -6.08 -17.94 -23.09
30: 19.12 -11.83 27.15 4.75 8.72 -2.36 0.55 12.09 -24.91 -14.52
31: 6.70 16.77 -0.58 23.95 -0.62 12.01 -18.14 -14.44 29.53 -25.74
32: -1.98 -2.19 4.33 -4.53 -9.42 11.73 3.92 5.57 13.01 4.88
33: -0.76 2.19 4.06 2.70 -1.94 12.93 -11.24 0.38 -3.80 15.91
34: 5.36 -0.51 -9.95 8.58 -6.18 9.04 -16.82 16.84 -11.72 -13.58
35: 9.35 1.82 0.64 -11.25 16.71 -12.31 -30.52 4.46 -6.25 -3.12
36: 1.90 -13.93 -5.92 21.28 -0.79 2.71 -21.93 -2.69 2.22 8.53
TEST F16_F32_ALIGNED_M m=4096 n=512 k=11008 batch=2 split_k=4 matmul 482.767ms avg_err=0.0103031
TEST F16_F32_ALIGNED_L m=4096 n=512 k=11008 batch=2 split_k=4 matmul 132.151ms avg_err=67.3337
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: -37.33 -4.56 -23.85 -40.69 2.51 11.77 46.17 -9.07 -33.60 -3.46
1: -76.01 9.55 50.91 -12.24 30.33 37.95 82.09 -45.11 -5.55 -55.85
2: -0.95 -27.87 43.86 2.29 11.40 -43.04 -22.92 -25.42 -6.70 -87.74
3: 30.52 -44.69 71.50 -29.66 -52.97 -19.43 34.83 12.04 29.63 -63.69
4: 22.66 42.66 2.86 -122.05 -52.22 40.17 -20.83 43.05 44.20 38.40
5: 55.23 -5.28 -6.19 23.87 -8.85 2.08 -10.20 -5.39 -112.67 -47.60
6: 33.65 -49.76 -40.03 14.30 -6.96 47.49 -27.47 39.43 -25.35 31.73
7: 4.27 -7.41 -7.04 33.86 -0.60 39.42 17.27 -33.25 74.01 -43.47
8: -34.94 34.02 -38.56 19.78 -39.09 -23.97 -36.48 4.10 -74.96 -12.62
9: 10.68 -0.66 1.68 8.01 -51.24 6.95 -34.28 -39.65 -62.39 -18.40
Expected result:
0 1 2 3 4 5 6 7 8 9
0: -14.85 12.85 7.38 -80.52 -11.71 -2.42 6.35 -24.69 48.77 -16.26
1: -28.24 -25.04 -23.17 38.40 47.37 -15.94 51.41 12.13 -2.56 -74.07
2: -67.82 -75.52 23.40 -13.97 -5.28 -0.11 17.33 -18.12 20.47 -61.91
3: 1.51 -12.73 60.12 13.85 -20.62 10.66 84.59 25.60 16.12 -81.70
4: 5.12 42.12 -12.04 -111.24 -50.47 35.78 26.28 21.13 44.64 24.98
5: 40.94 0.35 -0.45 -13.95 -34.39 -0.69 -41.52 -3.01 -41.97 29.40
6: 2.00 -48.38 -15.50 -8.41 9.53 33.50 -9.95 -29.34 -18.80 40.49
7: 6.04 22.58 -5.53 -2.63 22.64 44.55 2.47 -42.56 4.97 -43.63
8: -37.48 -2.19 -11.04 1.49 -43.10 -0.75 16.72 -15.93 -62.44 21.04
9: 37.07 -36.59 8.40 -4.08 -28.01 -67.59 4.93 -6.01 -101.89 -24.47
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: -9.40 36.67 10.29 -19.01 -8.59 -4.79 17.59 3.03 12.65 -17.47
1: -14.51 11.91 10.05 -4.55 23.20 8.62 5.64 4.79 -17.55 -26.36
2: -7.70 -25.56 1.95 -0.47 4.71 0.28 -3.96 -2.01 10.51 -28.07
3: 7.00 5.97 23.10 -0.33 -17.06 12.10 -6.51 -6.27 7.94 -44.65
4: -5.20 19.37 3.87 -33.27 -0.19 16.32 7.38 12.09 15.58 2.33
5: 12.14 -13.96 -11.83 -2.13 -6.32 3.82 5.75 -7.35 -45.48 -19.93
6: 10.00 -14.24 -31.38 16.93 -11.79 -6.11 13.33 -6.94 -12.17 10.13
7: -20.78 7.76 13.04 34.33 18.06 -1.73 -5.08 -12.03 31.65 -10.69
8: 9.53 -2.48 -15.17 -14.49 -21.08 -25.60 7.65 -11.20 -16.34 -26.94
9: 11.23 -15.99 -1.18 -3.09 2.51 4.59 -22.58 -14.64 -63.56 -1.92
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: -39.01 -22.14 -7.22 -27.36 22.30 16.49 17.62 -37.20 18.14 -0.21
1: 3.37 -15.38 -8.18 30.34 -7.25 22.23 40.37 -34.05 5.14 -21.98
2: -22.33 0.33 17.06 0.74 -12.98 -35.89 8.13 -22.23 -10.97 -25.65
3: 7.18 -13.55 50.14 14.60 -13.32 -26.57 29.64 12.04 -2.51 -9.06
4: -9.88 36.89 -7.19 -41.08 -28.33 12.83 3.25 -2.60 20.90 36.17
5: 31.55 -4.99 13.27 14.01 -7.78 -2.92 -6.43 12.28 -25.30 8.16
6: -0.94 -18.83 9.32 19.88 -11.20 2.77 -19.66 -4.06 -12.87 11.82
7: 26.19 -5.23 -19.97 -3.15 6.43 26.86 21.09 -18.21 -2.47 -33.49
8: -28.63 0.58 7.98 22.71 6.02 13.93 -9.97 25.47 -44.11 18.14
9: 4.24 -13.87 -6.27 -7.44 -15.70 -15.49 10.94 -25.16 -6.68 8.81
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: 7.12 -32.46 -29.51 -2.82 -21.15 -21.34 5.42 3.09 -27.98 3.86
1: -47.77 16.78 26.31 -15.33 -1.58 4.76 20.41 -15.73 -0.82 3.82
2: 1.14 -3.56 -3.14 -1.53 3.56 3.76 -1.15 -5.78 8.72 -2.60
3: -12.01 -19.08 -8.34 -10.81 -11.92 -4.30 -3.98 4.55 33.87 4.60
4: 14.20 -12.60 0.05 -27.67 8.67 21.15 -6.52 6.48 31.14 13.51
5: 21.57 -7.31 -1.30 3.64 3.26 -16.37 -5.07 -16.28 4.25 -30.36
6: 6.12 -18.97 -18.81 0.67 -11.75 24.70 -3.01 31.84 14.32 -0.26
7: 7.93 21.46 -7.68 18.84 5.94 14.80 -1.82 -24.56 43.49 -8.04
8: -25.55 20.42 -15.85 12.63 -14.28 -19.70 0.69 -0.94 1.55 -16.98
9: 5.36 -8.69 17.31 3.18 -26.62 41.04 11.44 -3.89 -5.61 -13.31
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 3.97 13.36 2.59 8.50 9.96 21.41 5.53 22.01 -36.42 10.35
1: -17.10 -3.76 22.73 -22.71 15.97 2.34 15.66 -0.12 7.67 -11.33
2: 27.93 0.93 27.99 3.54 16.11 -11.19 -25.95 4.60 -14.95 -31.42
3: 28.35 -18.03 6.60 -33.12 -10.68 -0.65 15.68 1.72 -9.67 -14.58
4: 23.54 -0.99 6.12 -20.02 -32.38 -10.14 -24.94 27.08 -23.42 -13.60
5: -10.03 20.99 -6.33 8.35 1.99 17.54 -4.45 5.96 -46.14 -5.48
6: 18.47 2.28 0.84 -23.17 27.77 26.13 -18.13 18.59 -14.62 10.04
7: -9.07 -31.42 7.58 -16.16 -31.03 -0.51 3.07 21.56 1.33 8.76
8: 9.71 15.51 -15.53 -1.07 -9.74 7.41 -34.85 -9.23 -16.05 13.16
9: -10.15 37.90 -8.18 15.37 -11.42 -23.19 -34.09 4.03 13.47 -11.99
TEST F16_F32_ALIGNED_S m=32000 n=512 k=4096 batch=2 split_k=1 matmul 792.89ms avg_err=25.5427
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -5.82 -24.72 -27.55 6.62 30.88 -0.04 52.86 -29.59 26.51 -9.44
28: 4.65 -3.75 19.09 -44.60 3.78 23.82 -48.69 -8.50 1.06 3.46
29: 20.13 -27.33 11.52 3.89 13.52 22.38 8.32 20.29 26.08 14.03
30: -3.23 -8.93 12.35 -17.18 -12.37 22.23 21.83 -1.78 -37.50 12.50
31: 0.32 -31.96 3.70 -2.75 -2.04 -12.13 -29.06 41.43 -2.07 43.77
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -5.82 -24.72 -27.55 6.63 30.89 -0.04 52.86 -29.59 26.51 -9.45
28: 4.64 -3.75 19.08 -44.60 3.78 23.82 -48.69 -8.50 1.06 3.46
29: 20.14 -27.35 11.52 3.90 13.51 22.38 8.32 20.29 26.08 14.03
30: -3.23 -8.93 12.35 -17.17 -12.36 22.23 21.83 -1.78 -37.50 12.50
31: 0.31 -31.96 3.69 -2.75 -2.03 -12.13 -29.06 41.43 -2.07 43.77
32: -22.13 2.57 5.09 -19.23 -29.35 -6.20 52.40 3.18 -20.78 15.24
33: -12.24 9.41 9.91 -1.61 -15.55 9.53 1.19 14.35 -23.53 -13.00
34: -16.65 5.52 -48.19 -16.77 -12.61 -2.33 8.86 10.55 -8.41 1.10
35: 1.52 -9.80 24.84 15.17 23.55 -14.64 -9.16 -21.79 -29.66 25.49
36: -12.18 -16.87 -8.13 2.01 -15.82 -14.04 12.22 -5.64 -15.55 -12.45
TEST F16_F32_ALIGNED_M m=32000 n=512 k=4096 batch=2 split_k=1 matmul 1381.07ms avg_err=0.00628381
TEST F16_F32_ALIGNED_L m=32000 n=512 k=4096 batch=2 split_k=1 matmul 362.922ms avg_err=25.5378
m = 16000 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
15995: -21.19 20.98 5.83 19.77 19.21 -0.13 0.19 -11.02 0.18 -33.39
15996: -13.75 35.95 -7.50 -5.34 -15.30 17.48 -6.33 19.63 14.87 5.02
15997: 10.80 21.22 3.60 9.28 -27.29 -0.73 26.35 21.94 6.27 9.46
15998: 8.58 22.52 3.27 -2.35 -30.37 -50.40 -9.36 -16.20 -28.54 11.54
15999: 0.32 -37.74 -37.18 17.44 2.51 -7.22 16.34 19.78 -36.01 17.05
16000: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16001: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16002: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16003: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16004: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
15995: -21.19 20.98 5.82 19.76 19.21 -0.13 0.19 -11.01 0.18 -33.39
15996: -13.74 35.95 -7.51 -5.35 -15.30 17.47 -6.34 19.62 14.87 5.02
15997: 10.80 21.23 3.59 9.28 -27.29 -0.73 26.36 21.94 6.27 9.46
15998: 8.58 22.53 3.27 -2.35 -30.37 -50.40 -9.35 -16.20 -28.54 11.54
15999: 0.32 -37.74 -37.18 17.44 2.51 -7.22 16.35 19.79 -36.01 17.05
16000: 24.46 7.46 27.27 28.29 18.64 -46.54 2.38 -31.87 13.27 0.84
16001: 9.62 -9.39 -5.16 -6.71 -6.53 -13.25 -2.96 19.89 -44.64 50.02
16002: 15.99 6.48 27.51 -4.86 -6.68 -2.30 9.33 49.94 12.45 25.51
16003: -9.47 4.99 -14.42 -38.69 39.53 -11.88 4.51 -9.78 -40.04 -19.67
16004: -6.65 -2.14 -24.96 24.60 -31.07 4.26 5.17 0.16 -4.69 -10.87
TEST F16_F32_ALIGNED_S m=32000 n=512 k=4096 batch=2 split_k=4 matmul 884.912ms avg_err=25.5345
m = 32 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
27: -4.41 -13.77 25.22 10.98 12.12 -36.02 19.43 -65.67 7.33 12.04
28: -14.06 31.43 39.70 -1.88 -9.99 1.32 14.70 -18.10 5.90 -15.59
29: 11.11 -29.33 7.09 36.17 -9.99 7.55 -24.64 5.71 5.86 24.60
30: -0.19 5.06 30.58 28.25 12.83 9.04 26.33 -32.63 -19.08 -4.45
31: -16.86 -7.54 -23.00 -5.15 -0.53 -8.50 20.83 24.84 4.52 33.64
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Expected result:
0 1 2 3 4 5 6 7 8 9
27: -4.41 -13.77 25.23 10.98 12.13 -36.02 19.43 -65.67 7.33 12.05
28: -14.06 31.42 39.70 -1.88 -9.99 1.32 14.70 -18.09 5.90 -15.60
29: 11.11 -29.33 7.09 36.17 -10.00 7.56 -24.65 5.71 5.87 24.60
30: -0.18 5.06 30.57 28.25 12.83 9.04 26.34 -32.63 -19.09 -4.45
31: -16.85 -7.54 -23.00 -5.16 -0.53 -8.50 20.83 24.84 4.52 33.65
32: -24.53 2.72 6.55 -0.65 7.27 -36.05 -29.07 6.16 3.12 10.23
33: -5.54 40.14 -9.63 -5.19 -13.44 46.19 19.65 21.43 3.54 4.60
34: 17.59 7.18 -20.04 26.70 -21.89 -2.01 20.07 -57.93 -41.81 25.77
35: -12.88 6.39 -33.00 -9.24 -44.70 -21.17 15.96 -7.71 23.65 1.21
36: -14.89 -5.50 40.18 -6.59 17.33 -15.20 29.76 1.35 -23.39 -18.16
d_buf0:
0 1 2 3 4 5 6 7 8 9
27: 3.35 4.66 0.11 15.50 6.85 0.45 17.32 -20.67 -0.94 17.90
28: -26.16 16.78 13.96 -10.88 -12.59 -7.81 -0.23 2.26 7.13 -7.19
29: -15.19 -7.82 -21.35 2.04 -4.82 1.72 -2.94 -1.64 -9.10 2.38
30: 5.31 -6.98 12.49 3.95 1.85 -1.52 4.04 -7.85 -2.94 -2.10
31: -0.60 -7.47 -10.82 -9.95 1.00 8.20 11.63 -1.69 -5.26 -6.80
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf1:
0 1 2 3 4 5 6 7 8 9
27: -9.75 -16.07 13.69 -13.05 0.91 -14.23 -3.71 -21.45 -1.62 -0.29
28: -3.33 3.87 1.48 -7.50 1.00 8.54 3.29 -5.65 1.57 -8.79
29: 18.97 -8.21 3.92 23.52 0.80 4.99 8.83 -14.16 7.38 16.31
30: 10.36 13.38 -5.55 11.33 -0.01 -2.39 7.70 -16.55 4.99 -12.11
31: -10.50 10.77 -20.01 10.84 -10.44 0.07 5.27 4.75 -8.73 20.80
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf2:
0 1 2 3 4 5 6 7 8 9
27: 12.66 8.46 14.56 0.94 -1.71 -11.36 5.85 -16.47 7.91 -0.01
28: 9.20 -6.06 2.44 7.46 10.01 2.01 -1.66 -8.99 -14.18 -0.52
29: 6.68 -4.66 17.41 -7.24 -7.37 2.01 -35.30 10.77 10.60 -7.32
30: -13.20 0.58 11.19 -11.06 -5.10 3.47 26.02 -1.31 -1.76 4.92
31: -4.21 -21.60 -4.93 2.36 8.52 -14.80 5.52 15.28 9.21 11.93
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
d_buf3:
0 1 2 3 4 5 6 7 8 9
27: -10.66 -10.82 -3.14 7.58 6.07 -10.88 -0.04 -7.08 1.98 -5.55
28: 6.23 16.83 21.82 9.04 -8.41 -1.42 13.31 -5.72 11.38 0.91
29: 0.65 -8.65 7.11 17.85 1.40 -1.16 4.77 10.73 -3.02 13.24
30: -2.65 -1.92 12.44 24.03 16.10 9.48 -11.43 -6.92 -19.37 4.83
31: -1.55 10.76 12.75 -8.41 0.39 -1.97 -1.60 6.50 9.29 7.71
32: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
33: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TEST F16_F32_ALIGNED_M m=32000 n=512 k=4096 batch=2 split_k=4 matmul 1402.56ms avg_err=0.00628493
TEST F16_F32_ALIGNED_L m=32000 n=512 k=4096 batch=2 split_k=4 matmul 371.085ms avg_err=41.0884
m = 0 n = 0 b = 0
Actual result:
0 1 2 3 4 5 6 7 8 9
0: 18.21 16.31 -12.65 -9.62 -2.56 -6.61 1.87 -27.90 5.76 -12.93
1: 17.53 17.14 18.57 -6.92 -16.06 6.74 20.81 18.53 5.62 31.22
2: 22.18 -29.57 30.17 -38.59 10.55 12.79 14.71 13.67 -18.75 -3.84
3: -10.66 -8.07 -29.54 12.46 -8.04 4.12 10.10 23.26 16.21 18.10
4: -41.73 -5.14 1.40 -21.78 11.35 -1.13 -5.92 1.52 -40.91 -19.01
5: -28.03 0.27 -37.24 10.69 -10.71 -4.02 9.45 -9.42 -14.00 -19.42
6: -27.22 21.38 -14.65 30.71 16.22 16.19 4.39 23.86 -20.07 2.77
7: -29.53 1.47 -7.76 -18.30 -0.80 21.16 -21.97 14.07 -18.95 -50.60
8: -1.18 18.93 1.25 -17.98 -0.23 5.03 -31.51 28.86 8.75 5.91
9: 14.86 -28.74 -17.45 -17.19 -9.43 5.34 18.21 4.57 7.37 1.61
Expected result:
0 1 2 3 4 5 6 7 8 9
0: 37.78 13.95 -14.51 -2.78 -12.88 -19.19 -0.95 17.69 22.58 -4.17
1: 14.31 0.68 -22.27 4.91 -4.95 11.23 31.16 -10.44 -32.28 2.81
2: -2.70 -0.55 39.27 -24.13 -18.87 15.08 7.03 3.83 -17.15 -34.10
3: 3.34 -5.79 18.32 -31.72 -6.51 -14.71 5.71 21.92 -4.22 55.32
4: -27.85 12.38 -17.81 -15.91 12.59 13.81 -12.93 -20.52 -20.26 -3.57
5: -14.35 -8.68 -3.33 -6.81 1.38 4.46 -10.42 -16.35 -16.81 -14.77
6: 14.39 15.33 -30.88 12.89 -9.55 -25.93 -9.34 7.78 -7.82 -6.35
7: -42.15 -29.19 44.51 10.13 -28.10 19.92 8.17 15.43 -10.52 12.95
8: 0.64 29.34 -2.49 24.20 23.87 -29.99 -10.29 12.85 8.04 5.17
9: 18.29 -38.15 -11.87 -21.51 4.81 31.30 -1.82 4.22 -22.68 -27.56
d_buf0:
0 1 2 3 4 5 6 7 8 9
0: 19.17 0.93 -7.03 -15.13 -3.90 -15.51 -9.27 6.16 0.25 -8.97
1: -6.87 -4.55 -2.17 0.93 -5.31 2.32 1.22 -3.77 2.58 18.74
2: -17.42 -11.65 18.35 -20.04 -12.48 1.43 12.21 -26.48 -19.60 -11.97
3: 1.62 -10.34 8.78 1.98 0.09 -3.13 0.49 1.65 -12.66 16.02
4: -12.95 -7.01 12.05 -12.50 15.34 19.41 -13.50 -6.07 -13.60 -3.11
5: -1.45 -7.13 -10.99 6.70 14.70 7.49 -9.00 -4.89 3.90 -1.75
6: 2.40 2.18 -20.65 11.71 15.35 -6.43 -1.46 14.19 -3.41 -5.02
7: -12.63 14.76 4.52 -3.91 -14.37 6.42 -18.92 10.44 -7.49 -11.86
8: -5.39 11.50 0.19 9.83 25.05 7.61 -22.88 11.63 4.40 21.59
9: 6.71 -11.83 -0.96 -0.85 16.62 -3.32 11.24 14.25 -7.03 -2.56
d_buf1:
0 1 2 3 4 5 6 7 8 9
0: 2.35 0.92 6.00 2.95 2.60 -13.93 15.27 -10.21 15.98 -9.80
1: 15.42 -2.25 5.52 -6.22 -11.72 12.55 7.28 -1.48 -0.82 -1.56
2: 3.66 -12.61 -0.15 -5.77 -0.59 9.83 -2.69 7.97 -9.22 12.07
3: -3.64 9.61 -6.78 -5.50 -6.08 -2.44 11.06 15.00 2.38 13.36
4: -9.07 10.15 -12.69 -9.92 2.22 -15.29 -1.04 3.21 -1.87 4.99
5: -12.84 -1.99 0.43 -3.90 1.87 -1.26 9.25 1.57 -3.26 -11.44
6: -7.19 -13.67 -3.15 13.41 -10.92 9.03 3.32 -8.39 -8.70 6.20
7: -23.21 -18.53 9.77 -0.83 10.71 19.33 -4.81 7.03 -10.00 -6.23
8: -2.16 6.73 1.62 -10.96 -4.35 -18.88 -6.98 5.15 -0.17 -1.60
9: -0.51 -3.70 -5.08 -2.67 -6.61 8.97 0.95 2.05 -6.72 -0.84
d_buf2:
0 1 2 3 4 5 6 7 8 9
0: -21.53 16.21 -14.46 -1.05 7.05 -1.12 -8.27 -10.29 3.12 -12.50
1: 10.31 9.50 6.17 0.60 -11.69 7.60 0.38 8.97 12.76 0.09
2: 28.96 -5.62 0.64 0.10 13.19 10.55 -2.23 16.41 7.63 3.14
3: -12.76 0.43 -8.87 8.50 17.22 5.71 5.85 -8.91 14.43 -1.57
4: -7.80 -0.32 -13.01 -4.10 -8.74 -4.07 7.48 11.19 5.27 -13.87
5: -9.76 -4.35 -31.46 4.29 -5.83 -6.76 9.40 -14.53 -6.69 -1.13
6: -11.94 2.81 9.36 -8.55 -3.30 6.50 -4.95 3.76 -9.46 2.86
7: 19.12 2.12 -18.36 -13.91 4.64 -4.28 14.65 2.59 10.42 -1.05
8: -8.60 0.99 -9.23 -3.19 -17.82 4.38 -5.98 2.83 -6.17 -10.02
9: 0.33 -17.12 -8.51 13.64 -10.49 3.42 14.44 -2.83 5.73 5.87
d_buf3:
0 1 2 3 4 5 6 7 8 9
0: 18.22 -1.74 2.83 3.61 -8.30 23.96 4.15 -13.56 -13.59 18.33
1: -1.33 14.44 9.04 -2.23 12.67 -15.73 11.93 14.81 -8.90 13.93
2: 6.98 0.31 11.32 -12.89 10.43 -9.01 7.42 15.76 2.44 -7.09
3: 4.12 -7.77 -22.67 7.47 -19.27 3.99 -7.29 15.53 12.05 -9.70
4: -11.91 -7.96 15.06 4.74 2.54 -1.18 1.14 -6.81 -30.71 -7.02
5: -3.97 13.74 4.78 3.60 -21.45 -3.50 -0.19 8.43 -7.95 -5.09
6: -10.50 30.06 -0.21 14.14 15.08 7.08 7.48 14.29 1.51 -1.27
7: -12.81 3.12 -3.69 0.34 -1.77 -0.32 -12.89 -5.98 -11.87 -31.46
8: 14.97 -0.28 8.68 -13.67 -3.12 11.92 4.33 9.25 10.70 -4.07
9: 8.34 3.92 -2.90 -27.31 -8.96 -3.74 -8.42 -8.90 15.39 -0.85
GGML_ASSERT: ggml-vulkan.cpp:3871: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment