A better implementation of bead-sort

Introduction

Bead sort (a.k.a Abacus sort, or Gravity sort) is a sorting algorithm that can sort a list of positive integers.
Belonging to a class of natural algorithms, It uses (/simulates) gravity to sort an input list.
The sort algorithm accomplishes this in three acts —

As simple as the 3-step description makes it seem, the algorithm doesn’t translate as easily to software.
It is usually implemented as a 2D matrix representing the abacus/beads state and beads are dropped level-by-level.

Here’s a reference implementation in python –

'''
        ALGORITHM #1
Bead sort implementation from geeksforgeeks
  at https://www.geeksforgeeks.org/bead-sort-natural-sorting-algorithm/
Translated to python from C implementation.
Credit: www.geeksforgeeks.org
'''
def bead_sort(a):
    n = len(a)
    maximum = max(a)
    beads = [[0]*maximum for _ in xrange(n)]

    # Set beads in n rows
    # Each row starts with beads (1s) equal to as many beads as a[i]
    # e.g.
    #  if a = [2,6,1,4,3]
    # beads = [
    #           [1, 1, 0, 0, 0, 0]
    #           [1, 1, 1, 1, 1, 1]
    #           [1, 0, 0, 0, 0, 0]
    #           [1, 1, 1, 1, 0, 0]
    #           [1, 1, 1, 0, 0, 0]
    #          ]
    for i in xrange(n):
        for j in xrange(a[i]):
            beads[i][j] = 1

    for j in xrange(maximum):
        # count number of beads to drop in each column
        drop_beads_by_column = 0
        for i in xrange(n):
            drop_beads_by_column += beads[i][j]

            # Take out all beads in current column
            beads[i][j] = 0

        # drop the beads from taken out from current column to lower levels
        # by stacking them one above the other
        for i in xrange(n-1, n-drop_beads_by_column-1, -1):
            beads[i][j] = 1

        # count the beads in each row (now sorted row-wise),
        # and add the count back into the original array
        # sorting the input array by ascending order
        # if a = [2,6,1,4,3]
        # post-sorting, the beads matrix(/the abacus) state would look-like
        # beads = [
        #           [1, 0, 0, 0, 0, 0]
        #           [1, 1, 0, 0, 0, 0]
        #           [1, 1, 1, 0, 0, 0]
        #           [1, 1, 1, 1, 0, 0]
        #           [1, 1, 1, 1, 1, 1]
        #          ]
        for i in xrange(n):
            # count number of 1s in each row
            num_beads_in_row = beads[i].count(1)
            a[i] = num_beads_in_row

    return a

The above implementation uses O(n*max) storage⁽¹⁾, and has a time complexiy of O(n*max).

(n: size of the input list
 max: maximum element in the input list)

A O(n*max) storage efficiency is terrible for a sorting algorithm.
The time efficiency isn’t as terrible though, it can potentially outperform O(nlogn) comparison-based sort algorithms for very small numbers (i.e. max << n).
For instance, if all numbers in the input list are below 100, then the algorithm runs in almost linear time for large n.
But log(n) grows far too slow for a meaningful ‘small numbers set’ for the above algorithm to outperform O(nlogn).

For context, log₂(10⁹) ≈ 30.

However, the algorithm does have one last redeeming quality.
It yields well to parallelism.
In theory, with ‘max’ concurrent threads, each thread can independently drop the beads in their respective columns in linear time.

A better implementation

In this article, I outline an implementation with O(n) space, and O(n*(max-min)) time efficiency.

(n: size of the input list
 max: maximum element in the input list
 min: minimum element in the input list)

Full disclosure: I didn’t set out to improve the efficiency of bead-sort.
A random bug in a completely unrelated program pointed me towards an alternate, more efficient implementation for bead-sort.
Specifically, I was trying to print a histogram for a given set of numbers using ‘*’s for building blocks.
An input of [2,6,1,4,3] is to print the following pattern –

  *
  *
  *   *
  *   * *
* *   * *
* * * * *

The solution is simple enough –

1. There are going to be 'max' rows in the output. 
   So iterate over max rows down to 0
2. At each row i, 
   2.1 Print a '*' for each a[j] (0 <= j< n) if a[j] is tall enough to reach row i (a[j] > i)
   2.2 Add adequate spaces otherwise.

Below is the code for the same –

def print_histogram(a):
    maximum = max(a)
    for i in xrange(maximum-1, -1, -1):
        for j in xrange(len(a)):
            sys.stdout.write('* ' if a[j] > i else '  ')
        print

However, I missed step 2.2 in my initial solution, and ended up coding something like below –

 	sys.stdout.write('* ') if a[j] > i else None

and so when I ran it, the algorithm promptly printed

* 
* 
* * 
* * * 
* * * * 
* * * * *

While it was immediately obvious why this wasn’t quite the histogram output I was looking for, something else caught my eye. The number of ‘*’s in each column were sorted in descending order.
The absence of spaces when a[j] wasn’t big enough for row ‘i’ caused the ‘*’s gravitate towards the left and fill the gaps just as in the abacus sort illustration above.
I went ahead and implemented a bead-sort solution inspired by the bug in my histogram code.

 0. Initialize a temporary list, temp[], filled with 0s.
 1. Iterate 'max' rows down to 0
 2. At each row i,
    Increment temp[0..] by 1 for all a[j] greater than i.
 3. After 'max' iterations, the temporary list is sorted in descending order.
    3.1 Copy elements from temporary list back into the original list in reverse order
        to sort the input list by ascending order.

A reference python implementation of the aforementioned algorithm follows –

'''
        ALGORITHM #2
Move from rows 'maximum' to 0
simulating stacking up of the beads from the bottom-up
'''
def bead_sort(a):
    maximum = max(a)
    n = len(a)

    # Initialize a temporary array filled with zeroes
    temp = [0] * n
    for i in xrange(maximum-1, -1, -1):
        k = 0
        for j in xrange(n):
            if a[j] > i:
                temp[k] += 1
                k += 1

    # Copy temp array back into original array
    # replacing the array into sorted order
    # temp array is reverse sorted, so copy backwards for ascending order
    for i in xrange(n):
        a[i] = temp[n-i-1]

    return a

The above algorithm has a space efficiency of O(n) and a time efficiency of O(n*max).
While we got down the space efficiency to O(n), the time efficiency isn’t any better off.

Can we do better?

Notice how the following line is always true for all i < ‘minimum’?
(minimum: smallest element in the input list)

              if a[j] > i:

This means that the check can be avoided altogether by stopping the loop once i == minimum, and later incrementing all the elements in the list by ‘minimum’.
Conversely, initialize the list to [minimum] instead of 0, and iterate over rows — maximum to minimum.
Below code takes advantage of this optimization –

'''
        ALGORITHM #3
Optimize bead sort runtime by counting down to minimum instead of 0.
'''
def bead_sort(a):
    minimum, maximum = min(a), max(a)
    n = len(a)

    # Initialize a temporary array filled with minimum
    temp = [minimum] * n
    for i in xrange(maximum-1, minimum-1, -1):
        k = 0
        for j in xrange(n):
            if a[j] > i:
                temp[k] += 1
                k += 1

    # Copy temp array back into original array
    # replacing the array into sorted order
    # temp array is reverse sorted, so copy backwards for ascending order
    for i in xrange(n):
        a[i] = temp[n-i-1]

    return a

This optimization yields a time efficiency of O(n*(max-min)) while retaining the space efficiency of O(n) from the previous approach.
Much better!

Conclusion

Better time and space efficiency.

While algorithm #2 and #3 bring the space efficiency to O(n), Bead-sort still remains a terrible algorithm for sorting even with the O(n*(max-min)) time optimization at algorithm #3.
However, a time efficiency of O(n*(max-min) means it can approach linear time for numbers in a narrow range irrespective of how big the numbers themselves are.

For e.g.,
If all numbers in the input list are between 10⁹+1 to 10⁹+100, 
algorithm #3 has a runtime complexity of O(n * 100).
With a big enough ’n’ this will be closer to O(n).
In contrast, algorithms #1 and #2 would have a time efficiency of O(n*10⁹) for the same input.

Negative numbers

To be fair, algorithm #1 can potentially work with negative numbers.
One way this can be accomplished is by –

 1. Bring all the numbers to a baseline zero by adding a number x
 2. Sort the modified list
 3. Subsequently decrement all the numbers by x.

For e.g.,
An input of [2, -2, 3] can be modified to [4, 0, 5] by adding +2 to all input numbers,
running the sort algorithm on the modified list would yield [0,4,5],
and finally, restore the original list of numbers by subtracting 2 from all, 
resulting in a sorted order for the original input: [-2, 2, 3]

Optimize step while iterating between rows.

Algorithm #3 iterates between ‘rows of beads’ from maximum down to minimum in steps of 1.
If the nature of input numbers is known in advance, the step count can be optimized to move faster.

Consider for e.g, if the input numbers are all even, 
the step count can be increased to 2 reducing the runtime by half.
Or, 
if every number in the input list, a[i] ≡ x mod 500 for some x 
(imagine a worker thread handling every 500th request), 
then a step of ‘500’ can be used.

'''
        ALGORITHM #4
Step optimization
'''
def bead_sort(a):
    .   .   .
    step = 2
    for i in xrange(maximum-1, minimum-1, -step):
        k = 0
        for j in xrange(n):
            if a[j] > i:
                temp[k] += step
                k += 1
    .   .  .

Floating point numbers

With the step optimization outlined above, algorithm #4 can potentially work with floating point numbers when a proper step can be chosen.

An input of [2.0, 1.5, 3.0, -0.5] can be sorted with a step of 0.5 in algorithm #4.
However, finding a meaningful step might not always be viable
or might be too small (e.g, 0.0005) to be used for sorting.

Parallelism

While at first glance, algorithms #2, #3, and #4 appear to retain the parallel properties of #1, a closer look at the following lines reveals that updates to the temporary array needs to be synchronized and might not yield well to parallelism.

                temp[k] += step
                k += 1

And perhaps most importantly, Not all bugs are bad, it would seem.

References

⁽¹⁾A note at the end of the wolfram’s page does mention an optimization for algorithm #1 using O(n+max) space, but it doesn’t improve the time efficiency beyond O(n*max).

niranjan-nagaraju/A better implementation of bead-sort.md