BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-6600U CPU 2.60GHz, ProcessorCount=4
Frequency=2742193 Hz, Resolution=364.6716 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1586.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1586.0
Allocated=0 B
Method |
Value |
Mean |
StdErr |
StdDev |
Median |
Scaled |
Scaled-StdDev |
'Maths Clamp' |
-1 |
1.2373 ns |
0.0154 ns |
0.0595 ns |
1.2511 ns |
1.00 |
0.00 |
'No Maths Clamp' |
-1 |
1.2183 ns |
0.0147 ns |
0.0570 ns |
1.2248 ns |
0.99 |
0.07 |
'No Maths No Equals Clamp' |
-1 |
2.0156 ns |
0.0107 ns |
0.0399 ns |
2.0256 ns |
1.63 |
0.08 |
'No Maths Clamp No Ternary' |
-1 |
1.1519 ns |
0.0069 ns |
0.0247 ns |
1.1617 ns |
0.93 |
0.05 |
'No Maths No Equals Clamp No Ternary' |
-1 |
1.1801 ns |
0.0145 ns |
0.0562 ns |
1.1920 ns |
0.96 |
0.06 |
'Clamp using Bitwise Abs' |
-1 |
0.9129 ns |
0.0090 ns |
0.0326 ns |
0.9092 ns |
0.74 |
0.04 |
'Maths Clamp' |
0 |
1.2245 ns |
0.0103 ns |
0.0385 ns |
1.2348 ns |
1.00 |
0.00 |
'No Maths Clamp' |
0 |
1.2442 ns |
0.0116 ns |
0.0450 ns |
1.2560 ns |
1.02 |
0.05 |
'No Maths No Equals Clamp' |
0 |
1.9620 ns |
0.0140 ns |
0.0541 ns |
1.9488 ns |
1.60 |
0.07 |
'No Maths Clamp No Ternary' |
0 |
1.1913 ns |
0.0280 ns |
0.1047 ns |
1.1699 ns |
0.97 |
0.09 |
'No Maths No Equals Clamp No Ternary' |
0 |
1.9778 ns |
0.0112 ns |
0.0432 ns |
1.9723 ns |
1.62 |
0.06 |
'Clamp using Bitwise Abs' |
0 |
0.9206 ns |
0.0074 ns |
0.0287 ns |
0.9145 ns |
0.75 |
0.03 |
'Maths Clamp' |
255 |
1.2562 ns |
0.0187 ns |
0.0725 ns |
1.2320 ns |
1.00 |
0.00 |
'No Maths Clamp' |
255 |
0.9064 ns |
0.0299 ns |
0.1157 ns |
0.8482 ns |
0.72 |
0.10 |
'No Maths No Equals Clamp' |
255 |
1.9107 ns |
0.0031 ns |
0.0118 ns |
1.9119 ns |
1.53 |
0.08 |
'No Maths Clamp No Ternary' |
255 |
0.9426 ns |
0.0081 ns |
0.0314 ns |
0.9389 ns |
0.75 |
0.05 |
'No Maths No Equals Clamp No Ternary' |
255 |
1.9511 ns |
0.0047 ns |
0.0182 ns |
1.9515 ns |
1.56 |
0.08 |
'Clamp using Bitwise Abs' |
255 |
0.9079 ns |
0.0052 ns |
0.0195 ns |
0.9053 ns |
0.72 |
0.04 |
'Maths Clamp' |
256 |
1.1499 ns |
0.0038 ns |
0.0137 ns |
1.1484 ns |
1.00 |
0.00 |
'No Maths Clamp' |
256 |
1.2231 ns |
0.0114 ns |
0.0410 ns |
1.2134 ns |
1.06 |
0.04 |
'No Maths No Equals Clamp' |
256 |
2.0982 ns |
0.0285 ns |
0.1067 ns |
2.0862 ns |
1.82 |
0.09 |
'No Maths Clamp No Ternary' |
256 |
0.9000 ns |
0.0036 ns |
0.0139 ns |
0.8975 ns |
0.78 |
0.01 |
'No Maths No Equals Clamp No Ternary' |
256 |
0.7765 ns |
0.0433 ns |
0.1785 ns |
0.6871 ns |
0.68 |
0.15 |
'Clamp using Bitwise Abs' |
256 |
1.1191 ns |
0.0479 ns |
0.2582 ns |
0.9897 ns |
0.97 |
0.22 |
@JimBobSquarePants
It means the context is extremely useful here! I'd rather move the loop cores of
ConvertFromYCbCr()
-kind operations to utility methods and benchmark (+unit test!) different approaches as a whole. Clamping is just one of the many suboptimal stuff, I don't think the tactical microoptimization approach applied on small pieces is worth the efforts here.Can't prove this right now, but I'm pretty 80% sure that the winner algorithm for the whole JpegChannels-->TColor conversion is:
JpegPixelArea
data) as afloat[]
instead ofbyte[]
. It will turnBlock8x8.CopyColorsTo()
into a zero cost method and entirely eliminate back-conversion!ConvertFromYCbCr()
Packy[]
,cb[]
andcr[]
float arrays into a singleVect4[]
YCbCr array in a batched wayIPackedVector.PackFromVector4(v)
to a batching solution (large scale, library-wide refactor)It's also important to note, that after merging #90 , the JpegDecoder bottleck would be the Huffman decoder:
I have optimization ideas for this too. If only I had a clone to implement them all :D