Skip to content

Instantly share code, notes, and snippets.

@mauri870
Last active December 4, 2023 21:04
Show Gist options
  • Save mauri870/f842262f4647f6703fc56dfc218f066b to your computer and use it in GitHub Desktop.
Save mauri870/f842262f4647f6703fc56dfc218f066b to your computer and use it in GitHub Desktop.
Comparing strings and byte slices with ==

Comparing string == string and []byte == []byte

The types string and []byte share the same memory layout, it is a pointer to memory followed by the length (with a capacity, in the case of slices) and the actual contents. When we do a == comparison with slices (or strings, if that matter) the runtime issues a call to

runtime.memequal(a, b unsafe.Pointer, size uintptr) bool

This function is akin to libc’s memcmp(3).

Most compilers, such as gcc, will generate very optimized assembly for an operation like this, but it mostly boils down to:

for (size_t i = 0; i < size; ++i) {
    if (c1[i] != c2[i]) {
        return false;
    }
}
return true;

That is just an incrementing loop with a comparison for each memory location.

Taking this out of the way, there are some optimization that are easy to follow when you are comparing memory without having to scan the whole memory, in a large slice, for example.

If the length is zero, there is nothing to compare, you can just return true. This optimization is already in memequal in all architectures, for example amd64.

Another one is if the pointers are different, they point to distinct things in memory. That is also included in most arches (amd64). But this one is missing on arm64.

It is still questionable to me if this is actually noticeable for normal go programs, because in order to rely on this last optimization you have to be comparing the same slice (or string) over and over. I can see a use case for it, maybe you are building a programming language or diff algorithm and is heavily interning strings, like if you find a string that you already have in memory, you just point to it instead of allocating a new one, that sounds like a valid use case, not very common, but definitely valid.

It also makes memequal O(1), as opposed to O(size) for these cases, as you can imagine that is a huge improvement for large slices (benchmark attached below). I find it interesting that some simple lines of code can get such improvements

diff --git a/src/internal/bytealg/equal_arm64.s b/src/internal/bytealg/equal_arm64.s
index d3aabba587..13eb5fb1bf 100644
--- a/src/internal/bytealg/equal_arm64.s
+++ b/src/internal/bytealg/equal_arm64.s
@@ -7,6 +7,9 @@
 
 // memequal(a, b unsafe.Pointer, size uintptr) bool
 TEXT runtime·memequal<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-25
+	// short path to handle equal pointers
+	CMP	R0, R1
+	BEQ	equal
 	// short path to handle 0-byte case
 	CBZ	R2, equal
 	B	memeqbody<>(SB)

As a curiosity, comparing basic types does not call into memequal, if you are comparing by value or the pointers themselves that results in a single CMP instruction, which is a one cycle instruction in most architectures, so extremely cheap to compute. Check it out on the compiler explorer.

https://go-review.googlesource.com/c/go/+/545416

goos: darwin
goarch: arm64
pkg: bytes
                 │      old.txt       │               new.txt                │
                 │       sec/op       │    sec/op     vs base                │
Equal/same/1-8           2.678n ± ∞ ¹   2.400n ± ∞ ¹   -10.38% (p=0.008 n=5)
Equal/same/6-8           3.267n ± ∞ ¹   2.431n ± ∞ ¹   -25.59% (p=0.008 n=5)
Equal/same/9-8           2.981n ± ∞ ¹   2.385n ± ∞ ¹   -19.99% (p=0.008 n=5)
Equal/same/15-8          2.974n ± ∞ ¹   2.390n ± ∞ ¹   -19.64% (p=0.008 n=5)
Equal/same/16-8          2.983n ± ∞ ¹   2.380n ± ∞ ¹   -20.21% (p=0.008 n=5)
Equal/same/20-8          3.567n ± ∞ ¹   2.384n ± ∞ ¹   -33.17% (p=0.008 n=5)
Equal/same/32-8          3.568n ± ∞ ¹   2.385n ± ∞ ¹   -33.16% (p=0.008 n=5)
Equal/same/4K-8         78.040n ± ∞ ¹   2.378n ± ∞ ¹   -96.95% (p=0.008 n=5)
Equal/same/4M-8      78713.000n ± ∞ ¹   2.385n ± ∞ ¹  -100.00% (p=0.008 n=5)
Equal/same/64M-8   1348095.000n ± ∞ ¹   2.381n ± ∞ ¹  -100.00% (p=0.008 n=5)
geomean                  43.52n         2.390n         -94.51%
¹ need >= 6 samples for confidence interval at level 0.95

                 │    old.txt    │                     new.txt                      │
                 │      B/s      │         B/s          vs base                     │
Equal/same/1-8     356.1Mi ± ∞ ¹         397.3Mi ± ∞ ¹        +11.57% (p=0.008 n=5)
Equal/same/6-8     1.711Gi ± ∞ ¹         2.298Gi ± ∞ ¹        +34.35% (p=0.008 n=5)
Equal/same/9-8     2.812Gi ± ∞ ¹         3.515Gi ± ∞ ¹        +24.99% (p=0.008 n=5)
Equal/same/15-8    4.698Gi ± ∞ ¹         5.844Gi ± ∞ ¹        +24.41% (p=0.008 n=5)
Equal/same/16-8    4.995Gi ± ∞ ¹         6.260Gi ± ∞ ¹        +25.34% (p=0.008 n=5)
Equal/same/20-8    5.222Gi ± ∞ ¹         7.814Gi ± ∞ ¹        +49.63% (p=0.008 n=5)
Equal/same/32-8    8.353Gi ± ∞ ¹        12.496Gi ± ∞ ¹        +49.59% (p=0.008 n=5)
Equal/same/4K-8    48.88Gi ± ∞ ¹       1603.96Gi ± ∞ ¹      +3181.17% (p=0.008 n=5)
Equal/same/4M-8    49.63Gi ± ∞ ¹    1637911.85Gi ± ∞ ¹   +3300381.91% (p=0.008 n=5)
Equal/same/64M-8   46.36Gi ± ∞ ¹   26253069.97Gi ± ∞ ¹  +56626517.99% (p=0.008 n=5)
geomean            6.737Gi               122.7Gi            +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment