The types string and []byte share the same memory layout, it is a pointer to memory followed by the length (with a capacity, in the case of slices) and the actual contents. When we do a ==
comparison with slices (or strings, if that matter) the runtime issues a call to
runtime.memequal(a, b unsafe.Pointer, size uintptr) bool
This function is akin to libc’s memcmp(3).
Most compilers, such as gcc, will generate very optimized assembly for an operation like this, but it mostly boils down to:
for (size_t i = 0; i < size; ++i) {
if (c1[i] != c2[i]) {
return false;
}
}
return true;
That is just an incrementing loop with a comparison for each memory location.
Taking this out of the way, there are some optimization that are easy to follow when you are comparing memory without having to scan the whole memory, in a large slice, for example.
If the length is zero, there is nothing to compare, you can just return true. This optimization is already in memequal in all architectures, for example amd64.
Another one is if the pointers are different, they point to distinct things in memory. That is also included in most arches (amd64). But this one is missing on arm64.
It is still questionable to me if this is actually noticeable for normal go programs, because in order to rely on this last optimization you have to be comparing the same slice (or string) over and over. I can see a use case for it, maybe you are building a programming language or diff algorithm and is heavily interning strings, like if you find a string that you already have in memory, you just point to it instead of allocating a new one, that sounds like a valid use case, not very common, but definitely valid.
It also makes memequal O(1), as opposed to O(size) for these cases, as you can imagine that is a huge improvement for large slices (benchmark attached below). I find it interesting that some simple lines of code can get such improvements
diff --git a/src/internal/bytealg/equal_arm64.s b/src/internal/bytealg/equal_arm64.s
index d3aabba587..13eb5fb1bf 100644
--- a/src/internal/bytealg/equal_arm64.s
+++ b/src/internal/bytealg/equal_arm64.s
@@ -7,6 +7,9 @@
// memequal(a, b unsafe.Pointer, size uintptr) bool
TEXT runtime·memequal<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-25
+ // short path to handle equal pointers
+ CMP R0, R1
+ BEQ equal
// short path to handle 0-byte case
CBZ R2, equal
B memeqbody<>(SB)
As a curiosity, comparing basic types does not call into memequal, if you are comparing by value or the pointers themselves that results in a single CMP instruction, which is a one cycle instruction in most architectures, so extremely cheap to compute. Check it out on the compiler explorer.
https://go-review.googlesource.com/c/go/+/545416
goos: darwin
goarch: arm64
pkg: bytes
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Equal/same/1-8 2.678n ± ∞ ¹ 2.400n ± ∞ ¹ -10.38% (p=0.008 n=5)
Equal/same/6-8 3.267n ± ∞ ¹ 2.431n ± ∞ ¹ -25.59% (p=0.008 n=5)
Equal/same/9-8 2.981n ± ∞ ¹ 2.385n ± ∞ ¹ -19.99% (p=0.008 n=5)
Equal/same/15-8 2.974n ± ∞ ¹ 2.390n ± ∞ ¹ -19.64% (p=0.008 n=5)
Equal/same/16-8 2.983n ± ∞ ¹ 2.380n ± ∞ ¹ -20.21% (p=0.008 n=5)
Equal/same/20-8 3.567n ± ∞ ¹ 2.384n ± ∞ ¹ -33.17% (p=0.008 n=5)
Equal/same/32-8 3.568n ± ∞ ¹ 2.385n ± ∞ ¹ -33.16% (p=0.008 n=5)
Equal/same/4K-8 78.040n ± ∞ ¹ 2.378n ± ∞ ¹ -96.95% (p=0.008 n=5)
Equal/same/4M-8 78713.000n ± ∞ ¹ 2.385n ± ∞ ¹ -100.00% (p=0.008 n=5)
Equal/same/64M-8 1348095.000n ± ∞ ¹ 2.381n ± ∞ ¹ -100.00% (p=0.008 n=5)
geomean 43.52n 2.390n -94.51%
¹ need >= 6 samples for confidence interval at level 0.95
│ old.txt │ new.txt │
│ B/s │ B/s vs base │
Equal/same/1-8 356.1Mi ± ∞ ¹ 397.3Mi ± ∞ ¹ +11.57% (p=0.008 n=5)
Equal/same/6-8 1.711Gi ± ∞ ¹ 2.298Gi ± ∞ ¹ +34.35% (p=0.008 n=5)
Equal/same/9-8 2.812Gi ± ∞ ¹ 3.515Gi ± ∞ ¹ +24.99% (p=0.008 n=5)
Equal/same/15-8 4.698Gi ± ∞ ¹ 5.844Gi ± ∞ ¹ +24.41% (p=0.008 n=5)
Equal/same/16-8 4.995Gi ± ∞ ¹ 6.260Gi ± ∞ ¹ +25.34% (p=0.008 n=5)
Equal/same/20-8 5.222Gi ± ∞ ¹ 7.814Gi ± ∞ ¹ +49.63% (p=0.008 n=5)
Equal/same/32-8 8.353Gi ± ∞ ¹ 12.496Gi ± ∞ ¹ +49.59% (p=0.008 n=5)
Equal/same/4K-8 48.88Gi ± ∞ ¹ 1603.96Gi ± ∞ ¹ +3181.17% (p=0.008 n=5)
Equal/same/4M-8 49.63Gi ± ∞ ¹ 1637911.85Gi ± ∞ ¹ +3300381.91% (p=0.008 n=5)
Equal/same/64M-8 46.36Gi ± ∞ ¹ 26253069.97Gi ± ∞ ¹ +56626517.99% (p=0.008 n=5)
geomean 6.737Gi 122.7Gi +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95