Binary search is theoretically optimal, but it's possible to speed it up substantially using AVX2 and branchless code even in .NET Core.
Memory access is the limiting factor for binary search. When we access each element for comparison a cache line is loaded, so we could load a 32-byte vector almost free, check if it contains the target value, and if not - reduce the search space by 32/sizeof(T)
elements instead of 1 element. This gives quite good performance improvement (code in BinarySearch1.cs
and results in the table 1 below).
However, for larger N the search space reduction is quite minimal and the most gains come from switching to linear search. After an interesting discussion in Twitter (especially with @trav_downs), and trying to widen the pivot area to use 2 AVX2 vectors it became clear that just switching to linear search sooner is more important than using AVX2 vectors as pivots.
The linear search was not using AVX2, and for linear