Skip to content

Instantly share code, notes, and snippets.

@wolever
Last active October 21, 2023 17:53
Show Gist options
  • Save wolever/3c3fa1f23a7e2e19dcb39e74af3d9282 to your computer and use it in GitHub Desktop.
Save wolever/3c3fa1f23a7e2e19dcb39e74af3d9282 to your computer and use it in GitHub Desktop.
Analysis of floating point fractional indexing

Fractional indexing is a technique for assigning a rank to ordered items, where an item's position can be changed - or a new item introduced - with only one rank value update.

For example, given the items:

{ name: "a", rank: 1 }
{ name: "c", rank: 2 }

With a naive rank ordering scheme, introducing "b" in between "a" and "c" would require updating the rank on each subsequent item:

{ name: "a", rank: 1 }
{ name: "b", rank: 2 } // new
{ name: "c", rank: 3 } // changed

However, using fractional indexing, a new item can be introduced without updating any existing item by setting new_item.rank = (next_item.rank + prev_item.rank) / 2:

{ name: "a", rank: 1 }
{ name: "b", rank: 1.5 } // new
{ name: "c", rank: 2 }

If floating point numbers are used, the question naturally arises as to how many times a new item can be introduced before the floating point number precision is exhausted.

(note: in practice, if precision exhaustion is a concern, a string value can be used to allow for arbitrarily many insertions https://github.com/rocicorp/fractional-indexing)

To understand how many insertions can be made, it's necessary to understand how floating point numbers are represented in memory. A detailed explanation is out of scope of this document (ex: https://www.youtube.com/watch?v=PZRI1IfStY0), but in brief: floating point numbers are represented as a sign bit, an exponent and a mantissa:

         v- exponent
1.25 = 0 01111111 01000000000000000000000
       ^- sign    ^- mantissa

The value of the float is calculated with:

value = -1^sign * 2^(exponent - 127) * mantissa

(note: the examples in this document use 32 bit floats, but the same principles apply to the more common 64 bit floats)

With this in mind, it's straightforward to determine that the maximum number of insertions that can be made are 2 ^ (number of exponent bits - 1) + (number mantissa bits) - 1, or 2 ^ (8 - 1) + 23 - 1 = 150 insertions for a 32 bit float, or 2 ^ (11 - 1) + 52 - 1 = 1,075 insertions for a 64 bit float.

This can be demonstrated with the following program:

import struct
import numpy as np

def float2bin(num: np.float32):
    bits = ''.join('{:0>8b}'.format(c) for c in struct.pack('!f', num))
    return f'{bits[0]} {bits[1:9]} {bits[9:]}'

v = np.float32(1)
iteration = 0
while True:
    print(f"{iteration:3d}: {float2bin(v)} ({v})")
    iteration += 1
    next_v = np.float32(v / 2)
    if v == next_v:
        break
    v = next_v

Which produces:

$ python float-division.py
  0: 0 01111111 00000000000000000000000 (1.0)
  1: 0 01111110 00000000000000000000000 (0.5)
  2: 0 01111101 00000000000000000000000 (0.25)
  3: 0 01111100 00000000000000000000000 (0.125)
  4: 0 01111011 00000000000000000000000 (0.0625)
  5: 0 01111010 00000000000000000000000 (0.03125)
...
146: 0 00000000 00000000000000000001000 (1.1210387714598537e-44)
147: 0 00000000 00000000000000000000100 (5.605193857299268e-45)
148: 0 00000000 00000000000000000000010 (2.802596928649634e-45)
149: 0 00000000 00000000000000000000001 (1.401298464324817e-45)
150: 0 00000000 00000000000000000000000 (0.0)

Note, however, that this is the maximum number of insertions, and it assumes that items are always being added in between the "first" (rank = 0) and second items; ie, so the rank of new items converges to 0.

If instead items are being added in between the "last" (rank = 1) and second to last items - ie, so the rank of new items converges to 1 - then the maximum number of insertions is substantially lower:

v = np.float32(0.5)
iteration = 0
while True:
    print(f"{iteration:3d}: {float2bin(v)} ({v})")
    iteration += 1
    next_v = np.float32(v + (1 - v) / 2)
    if v == next_v:
        break
    v = next_v

Which produces:

$ python float-division-towards-1.py
 0: 0 01111110 00000000000000000000000 (0.5)
 1: 0 01111110 10000000000000000000000 (0.75)
 2: 0 01111110 11000000000000000000000 (0.875)
 3: 0 01111110 11100000000000000000000 (0.9375)
 4: 0 01111110 11110000000000000000000 (0.96875)
 5: 0 01111110 11111000000000000000000 (0.984375)
 6: 0 01111110 11111100000000000000000 (0.9921875)
 7: 0 01111110 11111110000000000000000 (0.99609375)
 8: 0 01111110 11111111000000000000000 (0.998046875)
 9: 0 01111110 11111111100000000000000 (0.9990234375)
10: 0 01111110 11111111110000000000000 (0.99951171875)
11: 0 01111110 11111111111000000000000 (0.999755859375)
12: 0 01111110 11111111111100000000000 (0.9998779296875)
13: 0 01111110 11111111111110000000000 (0.99993896484375)
14: 0 01111110 11111111111111000000000 (0.999969482421875)
15: 0 01111110 11111111111111100000000 (0.9999847412109375)
16: 0 01111110 11111111111111110000000 (0.9999923706054688)
17: 0 01111110 11111111111111111000000 (0.9999961853027344)
18: 0 01111110 11111111111111111100000 (0.9999980926513672)
19: 0 01111110 11111111111111111110000 (0.9999990463256836)
20: 0 01111110 11111111111111111111000 (0.9999995231628418)
21: 0 01111110 11111111111111111111100 (0.9999997615814209)
22: 0 01111110 11111111111111111111110 (0.9999998807907104)
23: 0 01111110 11111111111111111111111 (0.9999999403953552)
24: 0 01111111 00000000000000000000000 (1.0)

To understand why this is the case, notice that in the first instance - when the new item's rank converges to 0 - the division by 2 happens by subtracting 1 from the exponent bits; ex:

            v-- notice the subtraction of 1 at each step
0: 0 01111111 00000000000000000000000 (1.0)
1: 0 01111110 00000000000000000000000 (0.5)
2: 0 01111101 00000000000000000000000 (0.25)
3: 0 01111100 00000000000000000000000 (0.125)
...

Which means there can be approximately 2^(number of exponent bits) insertions before precision is exhausted.

However, in the second instance - when the new item's rank converges to 1 - the division by 2 happens by shifting a 1 into the mantissa bits; ex:

              v-- notice the shift of 1 at each step
0: 0 01111110 00000000000000000000000 (0.5)
1: 0 01111110 10000000000000000000000 (0.75)
2: 0 01111110 11000000000000000000000 (0.875)
3: 0 01111110 11100000000000000000000 (0.9375)
...

Which means there can only be approximately (number of mantissa bits) insertions before precision is exhausted.

So, in practice, how many insertions between any two elements can we expect to be able to make before precision is exhausted?

Using a 64 bit float (the default in JavaScript, Python, and most other languages), "about 50" is a reasonable estimate.

But if there's any concern about precision, it's likely best to use a string instead of a float, which will allow for an arbitrary number of insertions (at the cost of slightly larger values): https://github.com/rocicorp/fractional-indexing

See also:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment