Created
January 18, 2024 21:43
-
-
Save AngryLoki/b94f6a1c3ee0ce757790dde47a5e2de6 to your computer and use it in GitHub Desktop.
openimageio-2.5.5.0-r1.ebuild unit_simd test failure
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
156/168 Testing: unit_simd | |
156/168 Test: unit_simd | |
Command: "/var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0_build/bin/simd_test" | |
Directory: /var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0_build/src/libutil | |
"unit_simd" start time: Jan 18 21:38 UTC | |
Output: | |
---------------------------------------------------------- | |
OIIO SIMD support is: sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f,avx512dq,avx512ifma,avx512cd,avx512bw,avx512vl,fma,f16c | |
Hardware SIMD support is: sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f,avx512dq,avx512ifma,avx512cd,avx512bw,avx512vl,fma,f16c,popcnt,rdrand | |
null benchmark 4: 19120.5 Mvals/sec, (19120.5 Mcalls/sec) | |
null benchmark 8: 18975.3 Mvals/sec, (18975.3 Mcalls/sec) | |
vfloat4 | |
load/store vfloat4 | |
partial load 1 : 101 0 0 0 | |
partial store 1 : 1 0 0 0 | |
partial load 2 : 101 102 0 0 | |
partial store 2 : 1 2 0 0 | |
partial load 3 : 101 102 103 0 | |
partial store 3 : 1 2 3 0 | |
partial load 4 : 101 102 103 104 | |
partial store 4 : 1 2 3 4 | |
load scalar: 18458.7 Mvals/sec, (4614.7 Mcalls/sec) | |
load vec: 18475.8 Mvals/sec, (4618.9 Mcalls/sec) | |
store vec: 18484.3 Mvals/sec, (4621.1 Mcalls/sec) | |
load 4 comps: 18484.3 Mvals/sec, (4621.1 Mcalls/sec) | |
load 3 comps: 12330.5 Mvals/sec, (4110.2 Mcalls/sec) | |
load 2 comps: 7270.1 Mvals/sec, (3635.0 Mcalls/sec) | |
load 1 comps: 2815.3 Mvals/sec, (2815.3 Mcalls/sec) | |
store 4 comps: 12507.8 Mvals/sec, (3127.0 Mcalls/sec) | |
store 3 comps: 8136.7 Mvals/sec, (2712.2 Mcalls/sec) | |
store 2 comps: 10443.9 Mvals/sec, (5221.9 Mcalls/sec) | |
store 1 comps: 5524.9 Mvals/sec, (5524.9 Mcalls/sec) | |
load/store with conversion vfloat4 | |
load from unsigned short[]: 18082.4 Mvals/sec, (4520.6 Mcalls/sec) | |
load from short[]: 16870.5 Mvals/sec, (4217.6 Mcalls/sec) | |
load from unsigned char[]: 16757.4 Mvals/sec, (4189.4 Mcalls/sec) | |
load from char[]: 16743.4 Mvals/sec, (4185.9 Mcalls/sec) | |
load from half[]: 16827.9 Mvals/sec, (4207.0 Mcalls/sec) | |
store to half[]: 86393.1 Mvals/sec, (21598.3 Mcalls/sec) | |
masked loadstore vfloat4 | |
masked load with int mask: 16820.9 Mvals/sec, (4205.2 Mcalls/sec) | |
masked load with bool mask: 16806.7 Mvals/sec, (4201.7 Mcalls/sec) | |
masked store with int mask: 21322.0 Mvals/sec, (21322.0 Mcalls/sec) | |
masked store with bool mask: 21598.3 Mvals/sec, (21598.3 Mcalls/sec) | |
scatter & gather vfloat4 | |
gather: 1902.3 Mvals/sec, (475.6 Mcalls/sec) | |
gather_mask: 1909.3 Mvals/sec, (477.3 Mcalls/sec) | |
scatter: 3370.1 Mvals/sec, (842.5 Mcalls/sec) | |
scatter_mask: 4857.9 Mvals/sec, (1214.5 Mcalls/sec) | |
component_access vfloat4 | |
operator[i]: 22075.1 Mvals/sec, (22075.1 Mcalls/sec) | |
operator[2]: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
operator[0]: 21598.3 Mvals/sec, (21598.3 Mcalls/sec) | |
extract<2> : 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
extract<0> : 21692.0 Mvals/sec, (21692.0 Mcalls/sec) | |
insert<2> : 4212.3 Mvals/sec, (4212.3 Mcalls/sec) | |
arithmetic vfloat4 | |
operator+: 16863.4 Mvals/sec, (4215.9 Mcalls/sec) | |
operator-: 16884.8 Mvals/sec, (4221.2 Mcalls/sec) | |
operator- (neg): 16638.9 Mvals/sec, (4159.7 Mcalls/sec) | |
operator*: 16856.3 Mvals/sec, (4214.1 Mcalls/sec) | |
operator* (scalar): 16884.8 Mvals/sec, (4221.2 Mcalls/sec) | |
operator/: 16913.3 Mvals/sec, (4228.3 Mcalls/sec) | |
abs: 16870.5 Mvals/sec, (4217.6 Mcalls/sec) | |
reduce_add: 16715.4 Mvals/sec, (4178.9 Mcalls/sec) | |
reference: add scalar: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
reference: mul scalar: 22026.4 Mvals/sec, (22026.4 Mcalls/sec) | |
reference: div scalar: 21881.8 Mvals/sec, (21881.8 Mcalls/sec) | |
comparisons vfloat4 | |
operator< : 17035.8 Mvals/sec, (4258.9 Mcalls/sec) | |
operator> : 17035.8 Mvals/sec, (4258.9 Mcalls/sec) | |
operator<=: 16842.1 Mvals/sec, (4210.5 Mcalls/sec) | |
operator>=: 16813.8 Mvals/sec, (4203.4 Mcalls/sec) | |
operator==: 16799.7 Mvals/sec, (4199.9 Mcalls/sec) | |
operator!=: 16813.8 Mvals/sec, (4203.4 Mcalls/sec) | |
shuffle vfloat4 | |
shuffle<...> : 16820.9 Mvals/sec, (4205.2 Mcalls/sec) | |
shuffle<0> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
shuffle<1> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
shuffle<2> : 16813.8 Mvals/sec, (4203.4 Mcalls/sec) | |
shuffle<3> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
swizzle vfloat4 | |
blend vfloat4 | |
blend: 16870.5 Mvals/sec, (4217.6 Mcalls/sec) | |
blend0: 16792.6 Mvals/sec, (4198.2 Mcalls/sec) | |
blend0not: 16849.2 Mvals/sec, (4212.3 Mcalls/sec) | |
transpose vfloat4 | |
before transpose: | |
0 1 2 3 | |
4 5 6 7 | |
8 9 10 11 | |
12 13 14 15 | |
after transpose: | |
0 4 8 12 | |
1 5 9 13 | |
2 6 10 14 | |
3 7 11 15 | |
vectorops vfloat4 | |
vdot: 16856.3 Mvals/sec, (4214.1 Mcalls/sec) | |
dot: 4208.8 Mvals/sec, (4208.8 Mcalls/sec) | |
vdot3: 16949.2 Mvals/sec, (4237.3 Mcalls/sec) | |
dot3: 4224.8 Mvals/sec, (4224.8 Mcalls/sec) | |
fused vfloat4 | |
madd old *+: 16835.0 Mvals/sec, (4208.8 Mcalls/sec) | |
madd fused: 16827.9 Mvals/sec, (4207.0 Mcalls/sec) | |
msub old *-: 16842.1 Mvals/sec, (4210.5 Mcalls/sec) | |
msub fused: 16842.1 Mvals/sec, (4210.5 Mcalls/sec) | |
nmadd old (-*)+: 16899.0 Mvals/sec, (4224.8 Mcalls/sec) | |
nmadd fused: 16820.9 Mvals/sec, (4205.2 Mcalls/sec) | |
nmsub old -(*+): 16632.0 Mvals/sec, (4158.0 Mcalls/sec) | |
nmsub fused: 16842.1 Mvals/sec, (4210.5 Mcalls/sec) | |
mathfuncs vfloat4 | |
simd abs: 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
simd sign: 16849.2 Mvals/sec, (4212.3 Mcalls/sec) | |
simd ceil: 16785.6 Mvals/sec, (4196.4 Mcalls/sec) | |
simd floor: 16778.5 Mvals/sec, (4194.6 Mcalls/sec) | |
simd round: 16806.7 Mvals/sec, (4201.7 Mcalls/sec) | |
simd operator/: 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
simd safe_div: 16977.9 Mvals/sec, (4244.5 Mcalls/sec) | |
simd rcp_fast: 17064.8 Mvals/sec, (4266.2 Mcalls/sec) | |
float ifloor: 21598.3 Mvals/sec, (21598.3 Mcalls/sec) | |
simd ifloor: 16625.1 Mvals/sec, (4156.3 Mcalls/sec) | |
float floorfrac: 21551.7 Mvals/sec, (21551.7 Mcalls/sec) | |
simd floorfrac: 6711.4 Mvals/sec, (1677.9 Mcalls/sec) | |
float expf: 21739.1 Mvals/sec, (21739.1 Mcalls/sec) | |
float fast_exp: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
simd exp: 6877.6 Mvals/sec, (1719.4 Mcalls/sec) | |
simd fast_exp: 9799.1 Mvals/sec, (2449.8 Mcalls/sec) | |
float logf: 21551.7 Mvals/sec, (21551.7 Mcalls/sec) | |
fast_log: 21459.2 Mvals/sec, (21459.2 Mcalls/sec) | |
simd log: 6944.4 Mvals/sec, (1736.1 Mcalls/sec) | |
simd fast_log: 9622.3 Mvals/sec, (2405.6 Mcalls/sec) | |
float powf: 21786.5 Mvals/sec, (21786.5 Mcalls/sec) | |
simd fast_pow_pos: 5914.5 Mvals/sec, (1478.6 Mcalls/sec) | |
float sqrt: 455.7 Mvals/sec, (455.7 Mcalls/sec) | |
simd sqrt: 16792.6 Mvals/sec, (4198.2 Mcalls/sec) | |
float rsqrt: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
simd rsqrt: 16820.9 Mvals/sec, (4205.2 Mcalls/sec) | |
simd rsqrt_fast: 16764.5 Mvals/sec, (4191.1 Mcalls/sec) | |
vfloat3 | |
load/store vfloat3 | |
partial load 1 : 101 0 0 | |
partial store 1 : 1 0 0 | |
partial load 2 : 101 102 0 | |
partial store 2 : 1 2 0 | |
partial load 3 : 101 102 103 | |
partial store 3 : 1 2 3 | |
load scalar: 11843.2 Mvals/sec, (3947.7 Mcalls/sec) | |
load vec: 11957.0 Mvals/sec, (3985.7 Mcalls/sec) | |
store vec: 8246.3 Mvals/sec, (2748.8 Mcalls/sec) | |
load 3 comps: 11900.0 Mvals/sec, (3966.7 Mcalls/sec) | |
load 2 comps: 8438.8 Mvals/sec, (4219.4 Mcalls/sec) | |
load 1 comps: 4163.2 Mvals/sec, (4163.2 Mcalls/sec) | |
store 3 comps: 8255.4 Mvals/sec, (2751.8 Mcalls/sec) | |
store 2 comps: 11129.7 Mvals/sec, (5564.8 Mcalls/sec) | |
store 1 comps: 5540.2 Mvals/sec, (5540.2 Mcalls/sec) | |
load/store with conversion vfloat3 | |
load from unsigned short[]: 12605.0 Mvals/sec, (4201.7 Mcalls/sec) | |
load from short[]: 12589.2 Mvals/sec, (4196.4 Mcalls/sec) | |
load from unsigned char[]: 12578.6 Mvals/sec, (4192.9 Mcalls/sec) | |
load from char[]: 12589.2 Mvals/sec, (4196.4 Mcalls/sec) | |
load from half[]: 12610.3 Mvals/sec, (4203.4 Mcalls/sec) | |
store to half[]: 62761.5 Mvals/sec, (20920.5 Mcalls/sec) | |
component_access vfloat3 | |
operator[i]: 21881.8 Mvals/sec, (21881.8 Mcalls/sec) | |
operator[2]: 21598.3 Mvals/sec, (21598.3 Mcalls/sec) | |
operator[0]: 21739.1 Mvals/sec, (21739.1 Mcalls/sec) | |
extract<2> : 21276.6 Mvals/sec, (21276.6 Mcalls/sec) | |
extract<0> : 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
insert<2> : 4192.9 Mvals/sec, (4192.9 Mcalls/sec) | |
arithmetic vfloat3 | |
operator+: 12547.1 Mvals/sec, (4182.4 Mcalls/sec) | |
operator-: 12594.5 Mvals/sec, (4198.2 Mcalls/sec) | |
operator- (neg): 12474.0 Mvals/sec, (4158.0 Mcalls/sec) | |
operator*: 12573.3 Mvals/sec, (4191.1 Mcalls/sec) | |
operator* (scalar): 12728.0 Mvals/sec, (4242.7 Mcalls/sec) | |
operator/: 12605.0 Mvals/sec, (4201.7 Mcalls/sec) | |
abs: 12631.6 Mvals/sec, (4210.5 Mcalls/sec) | |
reduce_add: 12599.7 Mvals/sec, (4199.9 Mcalls/sec) | |
add Imath::V3f: 8241.8 Mvals/sec, (2747.3 Mcalls/sec) | |
add Imath::V3f with simd: 8187.8 Mvals/sec, (2729.3 Mcalls/sec) | |
sub Imath::V3f: 8178.8 Mvals/sec, (2726.3 Mcalls/sec) | |
mul Imath::V3f: 7317.1 Mvals/sec, (2439.0 Mcalls/sec) | |
div Imath::V3f: 8230.5 Mvals/sec, (2743.5 Mcalls/sec) | |
reference: add scalar: 22471.9 Mvals/sec, (22471.9 Mcalls/sec) | |
reference: mul scalar: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
reference: div scalar: 20876.8 Mvals/sec, (20876.8 Mcalls/sec) | |
vectorops vfloat3 | |
vdot: 12610.3 Mvals/sec, (4203.4 Mcalls/sec) | |
dot: 4180.6 Mvals/sec, (4180.6 Mcalls/sec) | |
dot vfloat3: 4205.2 Mvals/sec, (4205.2 Mcalls/sec) | |
dot Imath::V3f: 21598.3 Mvals/sec, (21598.3 Mcalls/sec) | |
dot Imath::V3f with simd: 4196.4 Mvals/sec, (4196.4 Mcalls/sec) | |
normalize Imath: 2757.9 Mvals/sec, (2757.9 Mcalls/sec) | |
normalize Imath with simd: 2101.3 Mvals/sec, (2101.3 Mcalls/sec) | |
normalize Imath with simd fast: 2101.7 Mvals/sec, (2101.7 Mcalls/sec) | |
normalize simd: 12647.6 Mvals/sec, (4215.9 Mcalls/sec) | |
normalize simd fast: 12610.3 Mvals/sec, (4203.4 Mcalls/sec) | |
fused vfloat3 | |
madd old *+: 12663.6 Mvals/sec, (4221.2 Mcalls/sec) | |
madd fused: 16827.9 Mvals/sec, (4207.0 Mcalls/sec) | |
msub old *-: 12668.9 Mvals/sec, (4223.0 Mcalls/sec) | |
msub fused: 16835.0 Mvals/sec, (4208.8 Mcalls/sec) | |
nmadd old (-*)+: 12663.6 Mvals/sec, (4221.2 Mcalls/sec) | |
nmadd fused: 16884.8 Mvals/sec, (4221.2 Mcalls/sec) | |
nmsub old -(*+): 12594.5 Mvals/sec, (4198.2 Mcalls/sec) | |
nmsub fused: 16849.2 Mvals/sec, (4212.3 Mcalls/sec) | |
vfloat8 | |
load/store vfloat8 | |
partial load 1 : 101 0 0 0 0 0 0 0 | |
partial store 1 : 1 0 0 0 0 0 0 0 | |
partial load 2 : 101 102 0 0 0 0 0 0 | |
partial store 2 : 1 2 0 0 0 0 0 0 | |
partial load 3 : 101 102 103 0 0 0 0 0 | |
partial store 3 : 1 2 3 0 0 0 0 0 | |
partial load 4 : 101 102 103 104 0 0 0 0 | |
partial store 4 : 1 2 3 4 0 0 0 0 | |
partial load 5 : 101 102 103 104 105 0 0 0 | |
partial store 5 : 1 2 3 4 5 0 0 0 | |
partial load 6 : 101 102 103 104 105 106 0 0 | |
partial store 6 : 1 2 3 4 5 6 0 0 | |
partial load 7 : 101 102 103 104 105 106 107 0 | |
partial store 7 : 1 2 3 4 5 6 7 0 | |
partial load 8 : 101 102 103 104 105 106 107 108 | |
partial store 8 : 1 2 3 4 5 6 7 8 | |
load scalar: 31274.4 Mvals/sec, (3909.3 Mcalls/sec) | |
load vec: 30983.7 Mvals/sec, (3873.0 Mcalls/sec) | |
store vec: 31311.2 Mvals/sec, (3913.9 Mcalls/sec) | |
load 8 comps: 24442.4 Mvals/sec, (3055.3 Mcalls/sec) | |
load 7 comps: 17148.5 Mvals/sec, (2449.8 Mcalls/sec) | |
load 6 comps: 17331.0 Mvals/sec, (2888.5 Mcalls/sec) | |
load 5 comps: 14560.3 Mvals/sec, (2912.1 Mcalls/sec) | |
load 4 comps: 14367.8 Mvals/sec, (3592.0 Mcalls/sec) | |
load 3 comps: 10567.1 Mvals/sec, (3522.4 Mcalls/sec) | |
load 2 comps: 7165.9 Mvals/sec, (3582.9 Mcalls/sec) | |
load 1 comps: 3562.5 Mvals/sec, (3562.5 Mcalls/sec) | |
store 8 comps: 16827.9 Mvals/sec, (2103.5 Mcalls/sec) | |
store 7 comps: 12708.8 Mvals/sec, (1815.5 Mcalls/sec) | |
store 6 comps: 16560.9 Mvals/sec, (2760.1 Mcalls/sec) | |
store 5 comps: 13736.3 Mvals/sec, (2747.3 Mcalls/sec) | |
store 4 comps: 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
store 3 comps: 8273.6 Mvals/sec, (2757.9 Mcalls/sec) | |
store 2 comps: 10952.9 Mvals/sec, (5476.5 Mcalls/sec) | |
store 1 comps: 5482.5 Mvals/sec, (5482.5 Mcalls/sec) | |
load/store with conversion vfloat8 | |
load from unsigned short[]: 31384.9 Mvals/sec, (3923.1 Mcalls/sec) | |
load from short[]: 31372.5 Mvals/sec, (3921.6 Mcalls/sec) | |
load from unsigned char[]: 31446.5 Mvals/sec, (3930.8 Mcalls/sec) | |
load from char[]: 31397.2 Mvals/sec, (3924.6 Mcalls/sec) | |
load from half[]: 30995.7 Mvals/sec, (3874.5 Mcalls/sec) | |
store to half[]: 174291.9 Mvals/sec, (21786.5 Mcalls/sec) | |
masked loadstore vfloat8 | |
masked load with int mask: 31384.9 Mvals/sec, (3923.1 Mcalls/sec) | |
masked load with bool mask: 31348.0 Mvals/sec, (3918.5 Mcalls/sec) | |
masked store with int mask: 21834.1 Mvals/sec, (21834.1 Mcalls/sec) | |
masked store with bool mask: 21739.1 Mvals/sec, (21739.1 Mcalls/sec) | |
scatter & gather vfloat8 | |
gather: 2347.4 Mvals/sec, (293.4 Mcalls/sec) | |
gather_mask: 920.7 Mvals/sec, (115.1 Mcalls/sec) | |
scatter: 2091.6 Mvals/sec, (261.4 Mcalls/sec) | |
scatter_mask: 2072.6 Mvals/sec, (259.1 Mcalls/sec) | |
component_access vfloat8 | |
operator[i]: 21645.0 Mvals/sec, (21645.0 Mcalls/sec) | |
operator[2]: 21645.0 Mvals/sec, (21645.0 Mcalls/sec) | |
operator[0]: 21929.8 Mvals/sec, (21929.8 Mcalls/sec) | |
extract<2> : 21322.0 Mvals/sec, (21322.0 Mcalls/sec) | |
extract<0> : 21322.0 Mvals/sec, (21322.0 Mcalls/sec) | |
insert<2> : 3468.6 Mvals/sec, (3468.6 Mcalls/sec) | |
arithmetic vfloat8 | |
operator+: 31176.9 Mvals/sec, (3897.1 Mcalls/sec) | |
operator-: 31164.8 Mvals/sec, (3895.6 Mcalls/sec) | |
operator- (neg): 31620.6 Mvals/sec, (3952.6 Mcalls/sec) | |
operator*: 31360.3 Mvals/sec, (3920.0 Mcalls/sec) | |
operator* (scalar): 31311.2 Mvals/sec, (3913.9 Mcalls/sec) | |
operator/: 31274.4 Mvals/sec, (3909.3 Mcalls/sec) | |
abs: 31104.2 Mvals/sec, (3888.0 Mcalls/sec) | |
reduce_add: 31152.6 Mvals/sec, (3894.1 Mcalls/sec) | |
reference: add scalar: 22522.5 Mvals/sec, (22522.5 Mcalls/sec) | |
reference: mul scalar: 21881.8 Mvals/sec, (21881.8 Mcalls/sec) | |
reference: div scalar: 21739.1 Mvals/sec, (21739.1 Mcalls/sec) | |
comparisons vfloat8 | |
operator< : 31262.2 Mvals/sec, (3907.8 Mcalls/sec) | |
operator> : 31250.0 Mvals/sec, (3906.2 Mcalls/sec) | |
operator<=: 31164.8 Mvals/sec, (3895.6 Mcalls/sec) | |
operator>=: 31225.6 Mvals/sec, (3903.2 Mcalls/sec) | |
operator==: 31201.2 Mvals/sec, (3900.2 Mcalls/sec) | |
operator!=: 31250.0 Mvals/sec, (3906.2 Mcalls/sec) | |
shuffle vfloat8 | |
shuffle<...> : 31189.1 Mvals/sec, (3898.6 Mcalls/sec) | |
shuffle<0> : 31140.5 Mvals/sec, (3892.6 Mcalls/sec) | |
shuffle<1> : 30983.7 Mvals/sec, (3873.0 Mcalls/sec) | |
shuffle<2> : 31250.0 Mvals/sec, (3906.2 Mcalls/sec) | |
shuffle<3> : 31152.6 Mvals/sec, (3894.1 Mcalls/sec) | |
shuffle<4> : 31116.3 Mvals/sec, (3889.5 Mcalls/sec) | |
shuffle<5> : 31116.3 Mvals/sec, (3889.5 Mcalls/sec) | |
shuffle<6> : 31189.1 Mvals/sec, (3898.6 Mcalls/sec) | |
shuffle<7> : 31007.8 Mvals/sec, (3876.0 Mcalls/sec) | |
blend vfloat8 | |
blend: 31152.6 Mvals/sec, (3894.1 Mcalls/sec) | |
blend0: 31237.8 Mvals/sec, (3904.7 Mcalls/sec) | |
blend0not: 31116.3 Mvals/sec, (3889.5 Mcalls/sec) | |
fused vfloat8 | |
madd old *+: 31080.0 Mvals/sec, (3885.0 Mcalls/sec) | |
madd fused: 31250.0 Mvals/sec, (3906.2 Mcalls/sec) | |
msub old *-: 31055.9 Mvals/sec, (3882.0 Mcalls/sec) | |
msub fused: 31237.8 Mvals/sec, (3904.7 Mcalls/sec) | |
nmadd old (-*)+: 31286.7 Mvals/sec, (3910.8 Mcalls/sec) | |
nmadd fused: 31323.4 Mvals/sec, (3915.4 Mcalls/sec) | |
nmsub old -(*+): 31225.6 Mvals/sec, (3903.2 Mcalls/sec) | |
nmsub fused: 31152.6 Mvals/sec, (3894.1 Mcalls/sec) | |
mathfuncs vfloat8 | |
simd abs: 31360.3 Mvals/sec, (3920.0 Mcalls/sec) | |
simd sign: 31434.2 Mvals/sec, (3929.3 Mcalls/sec) | |
simd ceil: 31225.6 Mvals/sec, (3903.2 Mcalls/sec) | |
simd floor: 30959.8 Mvals/sec, (3870.0 Mcalls/sec) | |
simd round: 30971.7 Mvals/sec, (3871.5 Mcalls/sec) | |
simd operator/: 31274.4 Mvals/sec, (3909.3 Mcalls/sec) | |
simd safe_div: 31213.4 Mvals/sec, (3901.7 Mcalls/sec) | |
simd rcp_fast: 30674.8 Mvals/sec, (3834.4 Mcalls/sec) | |
float ifloor: 21739.1 Mvals/sec, (21739.1 Mcalls/sec) | |
simd ifloor: 30840.4 Mvals/sec, (3855.1 Mcalls/sec) | |
float floorfrac: 21459.2 Mvals/sec, (21459.2 Mcalls/sec) | |
simd floorfrac: 12899.1 Mvals/sec, (1612.4 Mcalls/sec) | |
float expf: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
float fast_exp: 21276.6 Mvals/sec, (21276.6 Mcalls/sec) | |
simd exp: 13331.1 Mvals/sec, (1666.4 Mcalls/sec) | |
simd fast_exp: 11453.1 Mvals/sec, (1431.6 Mcalls/sec) | |
float logf: 20920.5 Mvals/sec, (20920.5 Mcalls/sec) | |
fast_log: 20746.9 Mvals/sec, (20746.9 Mcalls/sec) | |
simd log: 13402.6 Mvals/sec, (1675.3 Mcalls/sec) | |
simd fast_log: 18148.8 Mvals/sec, (2268.6 Mcalls/sec) | |
float powf: 6844.6 Mvals/sec, (6844.6 Mcalls/sec) | |
simd fast_pow_pos: 7258.1 Mvals/sec, (907.3 Mcalls/sec) | |
float sqrt: 452.3 Mvals/sec, (452.3 Mcalls/sec) | |
simd sqrt: 30557.7 Mvals/sec, (3819.7 Mcalls/sec) | |
float rsqrt: 20491.8 Mvals/sec, (20491.8 Mcalls/sec) | |
simd rsqrt: 30511.1 Mvals/sec, (3813.9 Mcalls/sec) | |
simd rsqrt_fast: 30326.0 Mvals/sec, (3790.8 Mcalls/sec) | |
vfloat16 | |
load/store vfloat16 | |
partial load 1 : 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 1 : 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 2 : 101 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 2 : 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 3 : 101 102 103 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 3 : 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 4 : 101 102 103 104 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 4 : 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 5 : 101 102 103 104 105 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 5 : 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 6 : 101 102 103 104 105 106 0 0 0 0 0 0 0 0 0 0 | |
partial store 6 : 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 0 | |
partial load 7 : 101 102 103 104 105 106 107 0 0 0 0 0 0 0 0 0 | |
partial store 7 : 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 0 | |
partial load 8 : 101 102 103 104 105 106 107 108 0 0 0 0 0 0 0 0 | |
partial store 8 : 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0 | |
partial load 9 : 101 102 103 104 105 106 107 108 109 0 0 0 0 0 0 0 | |
partial store 9 : 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 | |
partial load 10 : 101 102 103 104 105 106 107 108 109 110 0 0 0 0 0 0 | |
partial store 10 : 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 | |
partial load 11 : 101 102 103 104 105 106 107 108 109 110 111 0 0 0 0 0 | |
partial store 11 : 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0 | |
partial load 12 : 101 102 103 104 105 106 107 108 109 110 111 112 0 0 0 0 | |
partial store 12 : 1 2 3 4 5 6 7 8 9 10 11 12 0 0 0 0 | |
partial load 13 : 101 102 103 104 105 106 107 108 109 110 111 112 113 0 0 0 | |
partial store 13 : 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 | |
partial load 14 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 0 0 | |
partial store 14 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 | |
partial load 15 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 0 | |
partial store 15 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 | |
partial load 16 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
partial store 16 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
load scalar: 29138.6 Mvals/sec, (1821.2 Mcalls/sec) | |
load vec: 28622.5 Mvals/sec, (1788.9 Mcalls/sec) | |
store vec: 28891.3 Mvals/sec, (1805.7 Mcalls/sec) | |
load 16 comps: 28663.6 Mvals/sec, (1791.5 Mcalls/sec) | |
load 13 comps: 22640.2 Mvals/sec, (1741.6 Mcalls/sec) | |
load 9 comps: 15600.6 Mvals/sec, (1733.4 Mcalls/sec) | |
load 8 comps: 14558.7 Mvals/sec, (1819.8 Mcalls/sec) | |
load 7 comps: 12108.6 Mvals/sec, (1729.8 Mcalls/sec) | |
load 6 comps: 4904.8 Mvals/sec, (817.5 Mcalls/sec) | |
load 5 comps: 8735.2 Mvals/sec, (1747.0 Mcalls/sec) | |
load 4 comps: 7320.6 Mvals/sec, (1830.2 Mcalls/sec) | |
load 3 comps: 5217.4 Mvals/sec, (1739.1 Mcalls/sec) | |
load 2 comps: 3653.0 Mvals/sec, (1826.5 Mcalls/sec) | |
load 1 comps: 1822.5 Mvals/sec, (1822.5 Mcalls/sec) | |
store 16 comps: 28725.3 Mvals/sec, (1795.3 Mcalls/sec) | |
store 13 comps: 21262.7 Mvals/sec, (1635.6 Mcalls/sec) | |
store 9 comps: 16501.7 Mvals/sec, (1833.5 Mcalls/sec) | |
store 8 comps: 16750.4 Mvals/sec, (2093.8 Mcalls/sec) | |
store 7 comps: 12547.1 Mvals/sec, (1792.4 Mcalls/sec) | |
store 6 comps: 16344.3 Mvals/sec, (2724.1 Mcalls/sec) | |
store 5 comps: 13605.4 Mvals/sec, (2721.1 Mcalls/sec) | |
store 4 comps: 16743.4 Mvals/sec, (4185.9 Mcalls/sec) | |
store 3 comps: 8156.6 Mvals/sec, (2718.9 Mcalls/sec) | |
store 2 comps: 10875.5 Mvals/sec, (5437.7 Mcalls/sec) | |
store 1 comps: 5423.0 Mvals/sec, (5423.0 Mcalls/sec) | |
load/store with conversion vfloat16 | |
load from unsigned short[]: 28760.9 Mvals/sec, (1797.6 Mcalls/sec) | |
load from short[]: 29133.3 Mvals/sec, (1820.8 Mcalls/sec) | |
load from unsigned char[]: 28808.1 Mvals/sec, (1800.5 Mcalls/sec) | |
load from char[]: 28802.9 Mvals/sec, (1800.2 Mcalls/sec) | |
load from half[]: 28818.4 Mvals/sec, (1801.2 Mcalls/sec) | |
store to half[]: 347826.1 Mvals/sec, (21739.1 Mcalls/sec) | |
masked loadstore vfloat16 | |
masked load with int mask: 28648.2 Mvals/sec, (1790.5 Mcalls/sec) | |
masked load with bool mask: 28551.0 Mvals/sec, (1784.4 Mcalls/sec) | |
masked store with int mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
masked store with bool mask: 21097.0 Mvals/sec, (21097.0 Mcalls/sec) | |
scatter & gather vfloat16 | |
gather: 2462.0 Mvals/sec, (153.9 Mcalls/sec) | |
gather_mask: 2269.8 Mvals/sec, (141.9 Mcalls/sec) | |
scatter: 2041.6 Mvals/sec, (127.6 Mcalls/sec) | |
scatter_mask: 2309.1 Mvals/sec, (144.3 Mcalls/sec) | |
component_access vfloat16 | |
operator[i]: 4450.4 Mvals/sec, (4450.4 Mcalls/sec) | |
operator[2]: 4428.7 Mvals/sec, (4428.7 Mcalls/sec) | |
operator[0]: 4446.4 Mvals/sec, (4446.4 Mcalls/sec) | |
extract<2> : 4420.9 Mvals/sec, (4420.9 Mcalls/sec) | |
extract<0> : 4071.7 Mvals/sec, (4071.7 Mcalls/sec) | |
insert<2> : 1623.6 Mvals/sec, (1623.6 Mcalls/sec) | |
arithmetic vfloat16 | |
operator+: 29038.1 Mvals/sec, (1814.9 Mcalls/sec) | |
operator-: 29043.4 Mvals/sec, (1815.2 Mcalls/sec) | |
operator- (neg): 29017.0 Mvals/sec, (1813.6 Mcalls/sec) | |
operator*: 29261.2 Mvals/sec, (1828.8 Mcalls/sec) | |
operator* (scalar): 29059.2 Mvals/sec, (1816.2 Mcalls/sec) | |
operator/: 29064.5 Mvals/sec, (1816.5 Mcalls/sec) | |
abs: 29027.6 Mvals/sec, (1814.2 Mcalls/sec) | |
reduce_add: 29133.3 Mvals/sec, (1820.8 Mcalls/sec) | |
reference: add scalar: 4222.8 Mvals/sec, (4222.8 Mcalls/sec) | |
reference: mul scalar: 4215.9 Mvals/sec, (4215.9 Mcalls/sec) | |
reference: div scalar: 4214.1 Mvals/sec, (4214.1 Mcalls/sec) | |
comparisons vfloat16 | |
operator< : 350109.4 Mvals/sec, (21881.8 Mcalls/sec) | |
operator> : 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
operator<=: 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
operator>=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec) | |
operator==: 347826.1 Mvals/sec, (21739.1 Mcalls/sec) | |
operator!=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec) | |
shuffle vfloat16 | |
shuffle4<> : 28985.5 Mvals/sec, (1811.6 Mcalls/sec) | |
shuffle<> : 29449.7 Mvals/sec, (1840.6 Mcalls/sec) | |
blend vfloat16 | |
blend: 29027.6 Mvals/sec, (1814.2 Mcalls/sec) | |
blend0: 28959.3 Mvals/sec, (1810.0 Mcalls/sec) | |
blend0not: 29117.4 Mvals/sec, (1819.8 Mcalls/sec) | |
fused vfloat16 | |
madd old *+: 28901.7 Mvals/sec, (1806.4 Mcalls/sec) | |
madd fused: 28865.2 Mvals/sec, (1804.1 Mcalls/sec) | |
msub old *-: 24342.0 Mvals/sec, (1521.4 Mcalls/sec) | |
msub fused: 28959.3 Mvals/sec, (1810.0 Mcalls/sec) | |
nmadd old (-*)+: 28896.5 Mvals/sec, (1806.0 Mcalls/sec) | |
nmadd fused: 28818.4 Mvals/sec, (1801.2 Mcalls/sec) | |
nmsub old -(*+): 28891.3 Mvals/sec, (1805.7 Mcalls/sec) | |
nmsub fused: 28808.1 Mvals/sec, (1800.5 Mcalls/sec) | |
mathfuncs vfloat16 | |
/var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579: | |
FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3])) | |
values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4' | |
simd abs: 28828.8 Mvals/sec, (1801.8 Mcalls/sec) | |
simd sign: 29234.4 Mvals/sec, (1827.2 Mcalls/sec) | |
simd ceil: 18892.4 Mvals/sec, (1180.8 Mcalls/sec) | |
simd floor: 29287.9 Mvals/sec, (1830.5 Mcalls/sec) | |
simd round: 29293.3 Mvals/sec, (1830.8 Mcalls/sec) | |
simd operator/: 28823.6 Mvals/sec, (1801.5 Mcalls/sec) | |
simd safe_div: 28828.8 Mvals/sec, (1801.8 Mcalls/sec) | |
simd rcp_fast: 28308.6 Mvals/sec, (1769.3 Mcalls/sec) | |
float ifloor: 21459.2 Mvals/sec, (21459.2 Mcalls/sec) | |
simd ifloor: 29032.8 Mvals/sec, (1814.6 Mcalls/sec) | |
float floorfrac: 22026.4 Mvals/sec, (22026.4 Mcalls/sec) | |
simd floorfrac: 12503.9 Mvals/sec, (781.5 Mcalls/sec) | |
float expf: 21459.2 Mvals/sec, (21459.2 Mcalls/sec) | |
float fast_exp: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
simd exp: 14864.4 Mvals/sec, (929.0 Mcalls/sec) | |
simd fast_exp: 20085.4 Mvals/sec, (1255.3 Mcalls/sec) | |
float logf: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
fast_log: 21367.5 Mvals/sec, (21367.5 Mcalls/sec) | |
simd log: 14301.0 Mvals/sec, (893.8 Mcalls/sec) | |
simd fast_log: 19524.1 Mvals/sec, (1220.3 Mcalls/sec) | |
float powf: 21459.2 Mvals/sec, (21459.2 Mcalls/sec) | |
simd fast_pow_pos: 13965.3 Mvals/sec, (872.8 Mcalls/sec) | |
float sqrt: 460.5 Mvals/sec, (460.5 Mcalls/sec) | |
simd sqrt: 28617.4 Mvals/sec, (1788.6 Mcalls/sec) | |
float rsqrt: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
simd rsqrt: 28699.6 Mvals/sec, (1793.7 Mcalls/sec) | |
simd rsqrt_fast: 28648.2 Mvals/sec, (1790.5 Mcalls/sec) | |
vint4 | |
load/store vint4 | |
partial load 1 : 101 0 0 0 | |
partial store 1 : 1 0 0 0 | |
partial load 2 : 101 102 0 0 | |
partial store 2 : 1 2 0 0 | |
partial load 3 : 101 102 103 0 | |
partial store 3 : 1 2 3 0 | |
partial load 4 : 101 102 103 104 | |
partial store 4 : 1 2 3 4 | |
load scalar: 16266.8 Mvals/sec, (4066.7 Mcalls/sec) | |
load vec: 16286.0 Mvals/sec, (4071.5 Mcalls/sec) | |
store vec: 16611.3 Mvals/sec, (4152.8 Mcalls/sec) | |
load 4 comps: 16200.9 Mvals/sec, (4050.2 Mcalls/sec) | |
load 3 comps: 11815.7 Mvals/sec, (3938.6 Mcalls/sec) | |
load 2 comps: 8183.3 Mvals/sec, (4091.7 Mcalls/sec) | |
load 1 comps: 4090.0 Mvals/sec, (4090.0 Mcalls/sec) | |
store 4 comps: 16604.4 Mvals/sec, (4151.1 Mcalls/sec) | |
store 3 comps: 8112.5 Mvals/sec, (2704.2 Mcalls/sec) | |
store 2 comps: 10834.2 Mvals/sec, (5417.1 Mcalls/sec) | |
store 1 comps: 5402.5 Mvals/sec, (5402.5 Mcalls/sec) | |
load/store with conversion vint4 | |
load from int[]: 16515.3 Mvals/sec, (4128.8 Mcalls/sec) | |
load from unsigned short[]: 16570.0 Mvals/sec, (4142.5 Mcalls/sec) | |
load from short[]: 16542.6 Mvals/sec, (4135.6 Mcalls/sec) | |
load from unsigned char[]: 16535.8 Mvals/sec, (4133.9 Mcalls/sec) | |
load from char[]: 16563.1 Mvals/sec, (4140.8 Mcalls/sec) | |
store to unsigned short[]: 16380.0 Mvals/sec, (4095.0 Mcalls/sec) | |
store to unsigned char[]: 16359.9 Mvals/sec, (4090.0 Mcalls/sec) | |
masked loadstore vint4 | |
masked load with int mask: 16528.9 Mvals/sec, (4132.2 Mcalls/sec) | |
masked load with bool mask: 16542.6 Mvals/sec, (4135.6 Mcalls/sec) | |
masked store with int mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
masked store with bool mask: 22624.4 Mvals/sec, (22624.4 Mcalls/sec) | |
scatter & gather vint4 | |
gather: 1889.7 Mvals/sec, (472.4 Mcalls/sec) | |
gather_mask: 1942.1 Mvals/sec, (485.5 Mcalls/sec) | |
scatter: 4376.4 Mvals/sec, (1094.1 Mcalls/sec) | |
scatter_mask: 5816.5 Mvals/sec, (1454.1 Mcalls/sec) | |
component_access vint4 | |
operator[i]: 21097.0 Mvals/sec, (21097.0 Mcalls/sec) | |
operator[2]: 21097.0 Mvals/sec, (21097.0 Mcalls/sec) | |
operator[0]: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
extract<2> : 27855.2 Mvals/sec, (27855.2 Mcalls/sec) | |
extract<0> : 27173.9 Mvals/sec, (27173.9 Mcalls/sec) | |
insert<2> : 4543.4 Mvals/sec, (4543.4 Mcalls/sec) | |
arithmetic vint4 | |
operator+: 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
operator-: 18181.8 Mvals/sec, (4545.5 Mcalls/sec) | |
operator- (neg): 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
operator*: 18206.6 Mvals/sec, (4551.7 Mcalls/sec) | |
operator* (scalar): 18190.1 Mvals/sec, (4547.5 Mcalls/sec) | |
operator/: 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
abs: 18198.4 Mvals/sec, (4549.6 Mcalls/sec) | |
reduce_add: 11277.1 Mvals/sec, (2819.3 Mcalls/sec) | |
reference: add scalar: 18797.0 Mvals/sec, (18797.0 Mcalls/sec) | |
reference: mul scalar: 18832.4 Mvals/sec, (18832.4 Mcalls/sec) | |
reference: div scalar: 19047.6 Mvals/sec, (19047.6 Mcalls/sec) | |
bitwise vint4 | |
operator&: 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
operator|: 18231.5 Mvals/sec, (4557.9 Mcalls/sec) | |
operator^: 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
operator!: 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
andnot: 18206.6 Mvals/sec, (4551.7 Mcalls/sec) | |
reduce_and: 18939.4 Mvals/sec, (18939.4 Mcalls/sec) | |
reduce_or : 19011.4 Mvals/sec, (19011.4 Mcalls/sec) | |
comparisons vint4 | |
operator< : 18231.5 Mvals/sec, (4557.9 Mcalls/sec) | |
operator> : 18231.5 Mvals/sec, (4557.9 Mcalls/sec) | |
operator<=: 18214.9 Mvals/sec, (4553.7 Mcalls/sec) | |
operator>=: 18148.8 Mvals/sec, (4537.2 Mcalls/sec) | |
operator==: 18198.4 Mvals/sec, (4549.6 Mcalls/sec) | |
operator!=: 18198.4 Mvals/sec, (4549.6 Mcalls/sec) | |
shuffle vint4 | |
shuffle<...> : 18223.2 Mvals/sec, (4555.8 Mcalls/sec) | |
shuffle<0> : 18198.4 Mvals/sec, (4549.6 Mcalls/sec) | |
shuffle<1> : 18181.8 Mvals/sec, (4545.5 Mcalls/sec) | |
shuffle<2> : 13315.6 Mvals/sec, (3328.9 Mcalls/sec) | |
shuffle<3> : 14492.8 Mvals/sec, (3623.2 Mcalls/sec) | |
blend vint4 | |
blend: 15898.3 Mvals/sec, (3974.6 Mcalls/sec) | |
blend0: 15760.4 Mvals/sec, (3940.1 Mcalls/sec) | |
blend0not: 17398.9 Mvals/sec, (4349.7 Mcalls/sec) | |
test converting vint4 to uint16 | |
load from uint16: 95923.3 Mvals/sec, (23980.8 Mcalls/sec) | |
convert to uint16: 16757.4 Mvals/sec, (4189.4 Mcalls/sec) | |
test converting vint4 to uint8 | |
load from uint8: 87146.0 Mvals/sec, (21786.5 Mcalls/sec) | |
convert to uint16: 16611.3 Mvals/sec, (4152.8 Mcalls/sec) | |
shift vint4 | |
[-80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000] | |
[-80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000] | |
[-80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000] | |
[-80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000] | |
[-1 -1 -1 -1] >> 1 == [-1 -1 -1 -1] | |
[-1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff] | |
[-1 -1 -1 -1] >> 4 == [-1 -1 -1 -1] | |
[-1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff] | |
[ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff] >> 4 == [fff fff fff fff] | |
[ffff ffff ffff ffff] srl 4 == [fff fff fff fff] | |
[3 3 3 3] >> 1 == [1 1 1 1] | |
[3 3 3 3] srl 1 == [1 1 1 1] | |
[3 3 3 3] >> 4 == [0 0 0 0] | |
[3 3 3 3] srl 4 == [0 0 0 0] | |
operator<<: 16522.1 Mvals/sec, (4130.5 Mcalls/sec) | |
operator>>: 16590.6 Mvals/sec, (4147.7 Mcalls/sec) | |
srl : 16597.5 Mvals/sec, (4149.4 Mcalls/sec) | |
rotl : 14214.6 Mvals/sec, (3553.7 Mcalls/sec) | |
transpose vint4 | |
before transpose: | |
0 1 2 3 | |
4 5 6 7 | |
8 9 10 11 | |
12 13 14 15 | |
after transpose: | |
0 4 8 12 | |
1 5 9 13 | |
2 6 10 14 | |
3 7 11 15 | |
vint8 | |
load/store vint8 | |
partial load 1 : 101 0 0 0 0 0 0 0 | |
partial store 1 : 1 0 0 0 0 0 0 0 | |
partial load 2 : 101 102 0 0 0 0 0 0 | |
partial store 2 : 1 2 0 0 0 0 0 0 | |
partial load 3 : 101 102 103 0 0 0 0 0 | |
partial store 3 : 1 2 3 0 0 0 0 0 | |
partial load 4 : 101 102 103 104 0 0 0 0 | |
partial store 4 : 1 2 3 4 0 0 0 0 | |
partial load 5 : 101 102 103 104 105 0 0 0 | |
partial store 5 : 1 2 3 4 5 0 0 0 | |
partial load 6 : 101 102 103 104 105 106 0 0 | |
partial store 6 : 1 2 3 4 5 6 0 0 | |
partial load 7 : 101 102 103 104 105 106 107 0 | |
partial store 7 : 1 2 3 4 5 6 7 0 | |
partial load 8 : 101 102 103 104 105 106 107 108 | |
partial store 8 : 1 2 3 4 5 6 7 8 | |
load scalar: 30616.2 Mvals/sec, (3827.0 Mcalls/sec) | |
load vec: 30441.4 Mvals/sec, (3805.2 Mcalls/sec) | |
store vec: 30983.7 Mvals/sec, (3873.0 Mcalls/sec) | |
load 8 comps: 30546.0 Mvals/sec, (3818.3 Mcalls/sec) | |
load 7 comps: 4035.7 Mvals/sec, (576.5 Mcalls/sec) | |
load 6 comps: 22329.7 Mvals/sec, (3721.6 Mcalls/sec) | |
load 5 comps: 18726.6 Mvals/sec, (3745.3 Mcalls/sec) | |
load 4 comps: 15491.9 Mvals/sec, (3873.0 Mcalls/sec) | |
load 3 comps: 11278.2 Mvals/sec, (3759.4 Mcalls/sec) | |
load 2 comps: 7657.0 Mvals/sec, (3828.5 Mcalls/sec) | |
load 1 comps: 3840.2 Mvals/sec, (3840.2 Mcalls/sec) | |
store 8 comps: 31128.4 Mvals/sec, (3891.1 Mcalls/sec) | |
store 7 comps: 12565.1 Mvals/sec, (1795.0 Mcalls/sec) | |
store 6 comps: 16291.1 Mvals/sec, (2715.2 Mcalls/sec) | |
store 5 comps: 13568.5 Mvals/sec, (2713.7 Mcalls/sec) | |
store 4 comps: 16673.6 Mvals/sec, (4168.4 Mcalls/sec) | |
store 3 comps: 2102.3 Mvals/sec, (700.8 Mcalls/sec) | |
store 2 comps: 10905.1 Mvals/sec, (5452.6 Mcalls/sec) | |
store 1 comps: 5370.6 Mvals/sec, (5370.6 Mcalls/sec) | |
load/store with conversion vint8 | |
load from int[]: 30852.3 Mvals/sec, (3856.5 Mcalls/sec) | |
load from unsigned short[]: 30840.4 Mvals/sec, (3855.1 Mcalls/sec) | |
load from short[]: 30852.3 Mvals/sec, (3856.5 Mcalls/sec) | |
load from unsigned char[]: 30733.8 Mvals/sec, (3841.7 Mcalls/sec) | |
load from char[]: 30804.8 Mvals/sec, (3850.6 Mcalls/sec) | |
store to unsigned short[]: 32653.1 Mvals/sec, (4081.6 Mcalls/sec) | |
store to unsigned char[]: 32653.1 Mvals/sec, (4081.6 Mcalls/sec) | |
masked loadstore vint8 | |
masked load with int mask: 30840.4 Mvals/sec, (3855.1 Mcalls/sec) | |
masked load with bool mask: 30923.8 Mvals/sec, (3865.5 Mcalls/sec) | |
masked store with int mask: 22075.1 Mvals/sec, (22075.1 Mcalls/sec) | |
masked store with bool mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
scatter & gather vint8 | |
gather: 2354.5 Mvals/sec, (294.3 Mcalls/sec) | |
gather_mask: 2323.2 Mvals/sec, (290.4 Mcalls/sec) | |
scatter: 1452.5 Mvals/sec, (181.6 Mcalls/sec) | |
scatter_mask: 763.3 Mvals/sec, (95.4 Mcalls/sec) | |
component_access vint8 | |
operator[i]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
operator[2]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
operator[0]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
extract<2> : 10080.6 Mvals/sec, (10080.6 Mcalls/sec) | |
extract<0> : 21367.5 Mvals/sec, (21367.5 Mcalls/sec) | |
insert<2> : 3871.5 Mvals/sec, (3871.5 Mcalls/sec) | |
arithmetic vint8 | |
operator+: 30959.8 Mvals/sec, (3870.0 Mcalls/sec) | |
operator-: 30935.8 Mvals/sec, (3867.0 Mcalls/sec) | |
operator- (neg): 30911.9 Mvals/sec, (3864.0 Mcalls/sec) | |
operator*: 30935.8 Mvals/sec, (3867.0 Mcalls/sec) | |
operator* (scalar): 30923.8 Mvals/sec, (3865.5 Mcalls/sec) | |
operator/: 31019.8 Mvals/sec, (3877.5 Mcalls/sec) | |
abs: 30995.7 Mvals/sec, (3874.5 Mcalls/sec) | |
reduce_add: 31055.9 Mvals/sec, (3882.0 Mcalls/sec) | |
reference: add scalar: 22371.4 Mvals/sec, (22371.4 Mcalls/sec) | |
reference: mul scalar: 21505.4 Mvals/sec, (21505.4 Mcalls/sec) | |
reference: div scalar: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
bitwise vint8 | |
operator&: 31007.8 Mvals/sec, (3876.0 Mcalls/sec) | |
operator|: 31007.8 Mvals/sec, (3876.0 Mcalls/sec) | |
operator^: 30923.8 Mvals/sec, (3865.5 Mcalls/sec) | |
operator!: 31019.8 Mvals/sec, (3877.5 Mcalls/sec) | |
andnot: 31620.6 Mvals/sec, (3952.6 Mcalls/sec) | |
reduce_and: 21186.4 Mvals/sec, (21186.4 Mcalls/sec) | |
reduce_or : 21276.6 Mvals/sec, (21276.6 Mcalls/sec) | |
comparisons vint8 | |
operator< : 31104.2 Mvals/sec, (3888.0 Mcalls/sec) | |
operator> : 31007.8 Mvals/sec, (3876.0 Mcalls/sec) | |
operator<=: 31055.9 Mvals/sec, (3882.0 Mcalls/sec) | |
operator>=: 31019.8 Mvals/sec, (3877.5 Mcalls/sec) | |
operator==: 31019.8 Mvals/sec, (3877.5 Mcalls/sec) | |
operator!=: 30959.8 Mvals/sec, (3870.0 Mcalls/sec) | |
shuffle vint8 | |
shuffle<...> : 30947.8 Mvals/sec, (3868.5 Mcalls/sec) | |
shuffle<0> : 31068.0 Mvals/sec, (3883.5 Mcalls/sec) | |
shuffle<1> : 31176.9 Mvals/sec, (3897.1 Mcalls/sec) | |
shuffle<2> : 31092.1 Mvals/sec, (3886.5 Mcalls/sec) | |
shuffle<3> : 31176.9 Mvals/sec, (3897.1 Mcalls/sec) | |
shuffle<4> : 31068.0 Mvals/sec, (3883.5 Mcalls/sec) | |
shuffle<5> : 31335.7 Mvals/sec, (3917.0 Mcalls/sec) | |
shuffle<6> : 31360.3 Mvals/sec, (3920.0 Mcalls/sec) | |
shuffle<7> : 31397.2 Mvals/sec, (3924.6 Mcalls/sec) | |
blend vint8 | |
blend: 31348.0 Mvals/sec, (3918.5 Mcalls/sec) | |
blend0: 31409.5 Mvals/sec, (3926.2 Mcalls/sec) | |
blend0not: 31458.9 Mvals/sec, (3932.4 Mcalls/sec) | |
test converting vint8 to uint16 | |
load from uint16: 175824.2 Mvals/sec, (21978.0 Mcalls/sec) | |
convert to uint16: 33140.0 Mvals/sec, (4142.5 Mcalls/sec) | |
test converting vint8 to uint8 | |
load from uint8: 181818.2 Mvals/sec, (22727.3 Mcalls/sec) | |
convert to uint16: 33195.0 Mvals/sec, (4149.4 Mcalls/sec) | |
shift vint8 | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000] | |
[-1 -1 -1 -1 -1 -1 -1 -1] >> 1 == [-1 -1 -1 -1 -1 -1 -1 -1] | |
[-1 -1 -1 -1 -1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff] | |
[-1 -1 -1 -1 -1 -1 -1 -1] >> 4 == [-1 -1 -1 -1 -1 -1 -1 -1] | |
[-1 -1 -1 -1 -1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff] >> 4 == [fff fff fff fff fff fff fff fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff] srl 4 == [fff fff fff fff fff fff fff fff] | |
[3 3 3 3 3 3 3 3] >> 1 == [1 1 1 1 1 1 1 1] | |
[3 3 3 3 3 3 3 3] srl 1 == [1 1 1 1 1 1 1 1] | |
[3 3 3 3 3 3 3 3] >> 4 == [0 0 0 0 0 0 0 0] | |
[3 3 3 3 3 3 3 3] srl 4 == [0 0 0 0 0 0 0 0] | |
operator<<: 31262.2 Mvals/sec, (3907.8 Mcalls/sec) | |
operator>>: 31384.9 Mvals/sec, (3923.1 Mcalls/sec) | |
srl : 31335.7 Mvals/sec, (3917.0 Mcalls/sec) | |
rotl : 31384.9 Mvals/sec, (3923.1 Mcalls/sec) | |
vint16 | |
load/store vint16 | |
partial load 1 : 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 1 : 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 2 : 101 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 2 : 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 3 : 101 102 103 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 3 : 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 4 : 101 102 103 104 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 4 : 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 5 : 101 102 103 104 105 0 0 0 0 0 0 0 0 0 0 0 | |
partial store 5 : 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0 | |
partial load 6 : 101 102 103 104 105 106 0 0 0 0 0 0 0 0 0 0 | |
partial store 6 : 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 0 | |
partial load 7 : 101 102 103 104 105 106 107 0 0 0 0 0 0 0 0 0 | |
partial store 7 : 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 0 | |
partial load 8 : 101 102 103 104 105 106 107 108 0 0 0 0 0 0 0 0 | |
partial store 8 : 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0 | |
partial load 9 : 101 102 103 104 105 106 107 108 109 0 0 0 0 0 0 0 | |
partial store 9 : 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 | |
partial load 10 : 101 102 103 104 105 106 107 108 109 110 0 0 0 0 0 0 | |
partial store 10 : 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 | |
partial load 11 : 101 102 103 104 105 106 107 108 109 110 111 0 0 0 0 0 | |
partial store 11 : 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0 | |
partial load 12 : 101 102 103 104 105 106 107 108 109 110 111 112 0 0 0 0 | |
partial store 12 : 1 2 3 4 5 6 7 8 9 10 11 12 0 0 0 0 | |
partial load 13 : 101 102 103 104 105 106 107 108 109 110 111 112 113 0 0 0 | |
partial store 13 : 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 | |
partial load 14 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 0 0 | |
partial store 14 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 | |
partial load 15 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 0 | |
partial store 15 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 | |
partial load 16 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
partial store 16 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
load scalar: 29287.9 Mvals/sec, (1830.5 Mcalls/sec) | |
load vec: 28880.9 Mvals/sec, (1805.1 Mcalls/sec) | |
store vec: 28964.0 Mvals/sec, (1810.2 Mcalls/sec) | |
load 16 comps: 28917.4 Mvals/sec, (1807.3 Mcalls/sec) | |
load 13 comps: 10723.4 Mvals/sec, (824.9 Mcalls/sec) | |
load 9 comps: 7193.7 Mvals/sec, (799.3 Mcalls/sec) | |
load 8 comps: 14795.6 Mvals/sec, (1849.5 Mcalls/sec) | |
load 7 comps: 5751.8 Mvals/sec, (821.7 Mcalls/sec) | |
load 6 comps: 4983.0 Mvals/sec, (830.5 Mcalls/sec) | |
load 5 comps: 4165.3 Mvals/sec, (833.1 Mcalls/sec) | |
load 4 comps: 7382.8 Mvals/sec, (1845.7 Mcalls/sec) | |
load 3 comps: 2466.5 Mvals/sec, (822.2 Mcalls/sec) | |
load 2 comps: 3416.5 Mvals/sec, (1708.2 Mcalls/sec) | |
load 1 comps: 1847.7 Mvals/sec, (1847.7 Mcalls/sec) | |
store 16 comps: 29191.8 Mvals/sec, (1824.5 Mcalls/sec) | |
store 13 comps: 21385.1 Mvals/sec, (1645.0 Mcalls/sec) | |
store 9 comps: 16381.5 Mvals/sec, (1820.2 Mcalls/sec) | |
store 8 comps: 31043.9 Mvals/sec, (3880.5 Mcalls/sec) | |
store 7 comps: 12644.5 Mvals/sec, (1806.4 Mcalls/sec) | |
store 6 comps: 16451.9 Mvals/sec, (2742.0 Mcalls/sec) | |
store 5 comps: 13642.6 Mvals/sec, (2728.5 Mcalls/sec) | |
store 4 comps: 16680.6 Mvals/sec, (4170.1 Mcalls/sec) | |
store 3 comps: 2433.9 Mvals/sec, (811.3 Mcalls/sec) | |
store 2 comps: 10952.9 Mvals/sec, (5476.5 Mcalls/sec) | |
store 1 comps: 5461.5 Mvals/sec, (5461.5 Mcalls/sec) | |
load/store with conversion vint16 | |
load from int[]: 28653.3 Mvals/sec, (1790.8 Mcalls/sec) | |
load from unsigned short[]: 28673.8 Mvals/sec, (1792.1 Mcalls/sec) | |
load from short[]: 28709.9 Mvals/sec, (1794.4 Mcalls/sec) | |
load from unsigned char[]: 28663.6 Mvals/sec, (1791.5 Mcalls/sec) | |
load from char[]: 28917.4 Mvals/sec, (1807.3 Mcalls/sec) | |
store to unsigned short[]: 32881.2 Mvals/sec, (2055.1 Mcalls/sec) | |
store to unsigned char[]: 32854.2 Mvals/sec, (2053.4 Mcalls/sec) | |
masked loadstore vint16 | |
masked load with int mask: 28474.8 Mvals/sec, (1779.7 Mcalls/sec) | |
masked load with bool mask: 28668.7 Mvals/sec, (1791.8 Mcalls/sec) | |
masked store with int mask: 21052.6 Mvals/sec, (21052.6 Mcalls/sec) | |
masked store with bool mask: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
scatter & gather vint16 | |
gather: 2473.8 Mvals/sec, (154.6 Mcalls/sec) | |
gather_mask: 2459.0 Mvals/sec, (153.7 Mcalls/sec) | |
scatter: 318.5 Mvals/sec, (19.9 Mcalls/sec) | |
scatter_mask: 2206.0 Mvals/sec, (137.9 Mcalls/sec) | |
component_access vint16 | |
operator[i]: 5327.7 Mvals/sec, (5327.7 Mcalls/sec) | |
operator[2]: 5299.4 Mvals/sec, (5299.4 Mcalls/sec) | |
operator[0]: 5330.5 Mvals/sec, (5330.5 Mcalls/sec) | |
extract<2> : 5299.4 Mvals/sec, (5299.4 Mcalls/sec) | |
extract<0> : 5319.1 Mvals/sec, (5319.1 Mcalls/sec) | |
insert<2> : 1721.8 Mvals/sec, (1721.8 Mcalls/sec) | |
arithmetic vint16 | |
operator+: 28318.6 Mvals/sec, (1769.9 Mcalls/sec) | |
operator-: 28119.5 Mvals/sec, (1757.5 Mcalls/sec) | |
operator- (neg): 28016.1 Mvals/sec, (1751.0 Mcalls/sec) | |
operator*: 27976.9 Mvals/sec, (1748.6 Mcalls/sec) | |
operator* (scalar): 28070.2 Mvals/sec, (1754.4 Mcalls/sec) | |
operator/: 5839.8 Mvals/sec, (365.0 Mcalls/sec) | |
abs: 27937.8 Mvals/sec, (1746.1 Mcalls/sec) | |
reduce_add: 28075.1 Mvals/sec, (1754.7 Mcalls/sec) | |
reference: add scalar: 5310.7 Mvals/sec, (5310.7 Mcalls/sec) | |
reference: mul scalar: 5313.5 Mvals/sec, (5313.5 Mcalls/sec) | |
reference: div scalar: 4219.4 Mvals/sec, (4219.4 Mcalls/sec) | |
bitwise vint16 | |
operator&: 29085.6 Mvals/sec, (1817.9 Mcalls/sec) | |
operator|: 29159.8 Mvals/sec, (1822.5 Mcalls/sec) | |
operator^: 29352.4 Mvals/sec, (1834.5 Mcalls/sec) | |
operator!: 28896.5 Mvals/sec, (1806.0 Mcalls/sec) | |
andnot: 28818.4 Mvals/sec, (1801.2 Mcalls/sec) | |
reduce_and: 21413.3 Mvals/sec, (21413.3 Mcalls/sec) | |
reduce_or : 9661.8 Mvals/sec, (9661.8 Mcalls/sec) | |
comparisons vint16 | |
operator< : 359550.6 Mvals/sec, (22471.9 Mcalls/sec) | |
operator> : 332640.3 Mvals/sec, (20790.0 Mcalls/sec) | |
operator<=: 338983.0 Mvals/sec, (21186.4 Mcalls/sec) | |
operator>=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec) | |
operator==: 341151.4 Mvals/sec, (21322.0 Mcalls/sec) | |
operator!=: 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
shuffle vint16 | |
shuffle4<> : 28828.8 Mvals/sec, (1801.8 Mcalls/sec) | |
shuffle<> : 28933.1 Mvals/sec, (1808.3 Mcalls/sec) | |
blend vint16 | |
blend: 28808.1 Mvals/sec, (1800.5 Mcalls/sec) | |
blend0: 29117.4 Mvals/sec, (1819.8 Mcalls/sec) | |
blend0not: 28886.1 Mvals/sec, (1805.4 Mcalls/sec) | |
test converting vint16 to uint16 | |
load from uint16: 344086.0 Mvals/sec, (21505.4 Mcalls/sec) | |
convert to uint16: 33092.0 Mvals/sec, (2068.3 Mcalls/sec) | |
test converting vint16 to uint8 | |
load from uint8: 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
convert to uint16: 33051.0 Mvals/sec, (2065.7 Mcalls/sec) | |
shift vint16 | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000] | |
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000] | |
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] >> 1 == [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] | |
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff] | |
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] >> 4 == [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] | |
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] >> 4 == [fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff] | |
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] srl 4 == [fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff] | |
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] >> 1 == [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] | |
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] srl 1 == [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] | |
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] >> 4 == [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] | |
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] srl 4 == [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] | |
operator<<: 28933.1 Mvals/sec, (1808.3 Mcalls/sec) | |
operator>>: 28834.0 Mvals/sec, (1802.1 Mcalls/sec) | |
srl : 30245.7 Mvals/sec, (1890.4 Mcalls/sec) | |
rotl : 32296.5 Mvals/sec, (2018.5 Mcalls/sec) | |
vbool4 | |
shuffle vbool4 | |
shuffle<...> : 18315.0 Mvals/sec, (4578.8 Mcalls/sec) | |
shuffle<0> : 15552.1 Mvals/sec, (3888.0 Mcalls/sec) | |
shuffle<1> : 18535.7 Mvals/sec, (4633.9 Mcalls/sec) | |
shuffle<2> : 18552.9 Mvals/sec, (4638.2 Mcalls/sec) | |
shuffle<3> : 18561.5 Mvals/sec, (4640.4 Mcalls/sec) | |
component_access vbool4 | |
bitwise vbool4 | |
operator&: 18561.5 Mvals/sec, (4640.4 Mcalls/sec) | |
operator|: 18535.7 Mvals/sec, (4633.9 Mcalls/sec) | |
operator^: 18544.3 Mvals/sec, (4636.1 Mcalls/sec) | |
operator!: 18458.7 Mvals/sec, (4614.7 Mcalls/sec) | |
reduce_and: 2422.5 Mvals/sec, (2422.5 Mcalls/sec) | |
reduce_or : 2420.7 Mvals/sec, (2420.7 Mcalls/sec) | |
vbool8 | |
shuffle vbool8 | |
shuffle<...> : 32679.7 Mvals/sec, (4085.0 Mcalls/sec) | |
shuffle<0> : 32800.3 Mvals/sec, (4100.0 Mcalls/sec) | |
shuffle<1> : 32666.4 Mvals/sec, (4083.3 Mcalls/sec) | |
shuffle<2> : 32520.3 Mvals/sec, (4065.0 Mcalls/sec) | |
shuffle<3> : 32693.1 Mvals/sec, (4086.6 Mcalls/sec) | |
shuffle<4> : 32679.7 Mvals/sec, (4085.0 Mcalls/sec) | |
shuffle<5> : 32719.8 Mvals/sec, (4090.0 Mcalls/sec) | |
shuffle<6> : 32733.2 Mvals/sec, (4091.7 Mcalls/sec) | |
shuffle<7> : 32786.9 Mvals/sec, (4098.4 Mcalls/sec) | |
component_access vbool8 | |
bitwise vbool8 | |
operator&: 32626.4 Mvals/sec, (4078.3 Mcalls/sec) | |
operator|: 32786.9 Mvals/sec, (4098.4 Mcalls/sec) | |
operator^: 32115.6 Mvals/sec, (4014.5 Mcalls/sec) | |
operator!: 31176.9 Mvals/sec, (3897.1 Mcalls/sec) | |
reduce_and: 2402.1 Mvals/sec, (2402.1 Mcalls/sec) | |
reduce_or : 2396.4 Mvals/sec, (2396.4 Mcalls/sec) | |
vbool16 | |
component_access vbool16 | |
bitwise vbool16 | |
operator&: 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
operator|: 346320.4 Mvals/sec, (21645.0 Mcalls/sec) | |
operator^: 344827.6 Mvals/sec, (21551.7 Mcalls/sec) | |
operator!: 345572.3 Mvals/sec, (21598.3 Mcalls/sec) | |
reduce_and: 21881.8 Mvals/sec, (21881.8 Mcalls/sec) | |
reduce_or : 21551.7 Mvals/sec, (21551.7 Mcalls/sec) | |
Odds and ends | |
constants | |
vfloat4 = float(const): 16849.2 Mvals/sec, (4212.3 Mcalls/sec) | |
vfloat4 = Zero(): 16877.6 Mvals/sec, (4219.4 Mcalls/sec) | |
vfloat4 = One(): 16870.5 Mvals/sec, (4217.6 Mcalls/sec) | |
vfloat4 = Iota(): 16806.7 Mvals/sec, (4201.7 Mcalls/sec) | |
vfloat8 = float(const): 31458.9 Mvals/sec, (3932.4 Mcalls/sec) | |
vfloat8 = Zero(): 32989.7 Mvals/sec, (4123.7 Mcalls/sec) | |
vfloat8 = One(): 32989.7 Mvals/sec, (4123.7 Mcalls/sec) | |
vfloat8 = Iota(): 446.1 Mvals/sec, (55.8 Mcalls/sec) | |
vfloat16 = float(const): 28011.2 Mvals/sec, (1750.7 Mcalls/sec) | |
vfloat16 = Zero(): 29239.8 Mvals/sec, (1827.5 Mcalls/sec) | |
vfloat16 = One(): 29017.0 Mvals/sec, (1813.6 Mcalls/sec) | |
vfloat16 = Iota(): 28880.9 Mvals/sec, (1805.1 Mcalls/sec) | |
special | |
metaprogramming | |
Testing matrix ops: | |
P = (1 0 0) | |
Mtrans = ( 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 | |
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 | |
0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 | |
1.000000e+01 1.100000e+01 1.200000e+01 1.000000e+00) | |
Mrot = ( -4.371139e-08 -0.000000e+00 -1.000000e+00 -0.000000e+00 | |
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 | |
1.000000e+00 0.000000e+00 -4.371139e-08 0.000000e+00 | |
0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00) | |
P translated = 11 11 12 | |
P rotated = -4.37114e-08 0 -1 | |
P rotated by the transpose = -4.37114e-08 0 -1 | |
Mrot transposed = ( -4.371139e-08 0.000000e+00 1.000000e+00 0.000000e+00 | |
-0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 | |
-1.000000e+00 0.000000e+00 -4.371139e-08 0.000000e+00 | |
-0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00) | |
V4 * M44 Imath: 4168.4 Mvals/sec, (4168.4 Mcalls/sec) | |
M44 * V4 simd: 4198.2 Mvals/sec, (4198.2 Mcalls/sec) | |
V4 * M44 simd: 4221.2 Mvals/sec, (4221.2 Mcalls/sec) | |
transformp Imath: 2777.0 Mvals/sec, (2777.0 Mcalls/sec) | |
transformp Imath with simd: 2745.0 Mvals/sec, (2745.0 Mcalls/sec) | |
transformp simd: 4210.5 Mvals/sec, (4210.5 Mcalls/sec) | |
transpose m44: 1827.8 Mvals/sec, (1827.8 Mcalls/sec) | |
transpose m44 with simd: 1830.8 Mvals/sec, (1830.8 Mcalls/sec) | |
m44 inverse Imath: 82.8 Mvals/sec, (82.8 Mcalls/sec) | |
m44 inverse_simd: 99.9 Mvals/sec, (99.9 Mcalls/sec) | |
m44 inverse_simd native simd: 104.2 Mvals/sec, (104.2 Mcalls/sec) | |
Total time: 0.0s | |
ERRORS! | |
<end of output> | |
Test time = 0.08 sec | |
---------------------------------------------------------- | |
Test Failed. | |
"unit_simd" end time: Jan 18 21:38 UTC | |
"unit_simd" time elapsed: 00:00:00 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment