Kotlin Merge Sort Implementations: IntArray vs Array<Int> vs Coroutines (multi-threaded)

Kotlin Merge Sort Implementations: `IntArray` vs `Array<Int>` vs Coroutines (multi-threaded)

I was answering a post on codereview.stackexchange.com which led me to explore the performance penalty of using Array<Int> (Java Integer[]) instead of IntArray (Java int[]).

IntArray is expected to have better performance because it avoids the indirection of boxing/unboxing. But for code reuse purposes, Array<Int> is preferable since the algorithm can then be coded once for Array<T: Comparable<T>> and used with any comparable type.

I benchmarked the two approaches and along the way I also compared them to a multi-threaded merge-sort algorithm and merge-sort written in Java. For the multi-threaded version I used Kotlin coroutines.

It is frustrating to code the merge-sort algorithm for IntArray and Array<Int> separately since they are identical, except for the type declarations. So I then tried to solve this issue by applying the dictum that "in computer science there is nothing that can't be solved by adding another layer of indirection". I created a common interface and coded a single merge sort algorithm to that interface. Unfortunately there is a significant performance loss from the extra indirection, even when inlining as much as possible.

I pasted the code for all those algorithms below, followed by the testing and benchmarking code.

Benchmarking Results

The results below are the average sorting time for an array of 1,000,000 integers. Different implementations of the merge sort algorithm are compared, except for java.util.Arrays.sort which is an implementation "dual pivot quick sort".

As with all benchmarking, those results are too be taken lightly. For example, the relative strength of the different implementations can change significantly depending on the length of the array to sort.

Algorithm Implementation	Sort Time (ms)
`Array<Int>`	200
`IntArray`	90
`IntArray` multi-threaded (7 threads)	60
`IntArray` multi-threaded (1 thread)	120
Pure Java `int[]`	115
`java.util.Arrays.sort`	60
`IntArray` via common interface	380
`Array<Int>` via common interface	390

The main result is that IntArray is twice as fast as Array<Int> which is not surprising since there is no indirection cost from boxing/unboxing.

For the multi-threaded implementation of merge sort, the algorithm is somewhat different: instead of alternating between two work arrays, the parallel implementation creates new arrays at each iteration. It is not surprising that the multi-threaded implementation running on a single thread is slower than merge sort implemented with two work arrays since there is a cost to creating all those arrays. To avoid creating too many small arrays, the multi-threaded implementation calls the non-multi-threaded algorithm when the arrays are smaller than 128 items. The performance would have been much worse otherwise. This was tested on a computer with 4 cores / 8 hyperthreads. The CommonPool of Kotlin coroutines uses a ForkJoinPool with they number of threads being the number of available threads on the system minus one.

It seems that implementing merge sort in Java is somewhat slower than Kotlin, but maybe there is a slight difference in how I coded them.

Finally, as mentioned at the beginning, I tried to introduce a common interface so that I could code merge sort only once for both IntArray and Array<Int>. This adds a layer of indirection and the performance is very disappointing. I tried to use inline as much as possible. I was informed on reddit that the compiler can completely remove the indirection in some similar cases, but it seems to fail here. I have not spent much time trying to make it work, so it might still be possible to have the cost of indirection reduced or removed.

Code for `Array<Int>`

Standard merge sort algorithm using two work arrays, see wikipedia.

/**
 * @param array The array to sort.  A copy is made, so it's content is not modified.
 */
fun <T : Comparable<T>> mergeSort(array: Array<T>) : Array<T> {
    val workA = array.copyOf()
    val workB = array.copyOf()

    sortSection(workA, workB, 0, workA.size)

    return workB
}

/**
 * Sorts the specified section of the array.
 *
 * @param workA Should contain identical values as workB in the specified range.
 *              The final values in the specified range are destroyed (actually they are made
 *              of two adjacent sorted ranged).
 * @param workB Should contain identical values as workA in the specified range.
 *              The final values in the specified range are the sorted values in that range.
 */
internal fun <T : Comparable<T>> sortSection(workA: Array<T>,
                                             workB:Array<T>,
                                             start: Int,
                                             exclusiveEnd: Int) : Unit {

    fun <T : Comparable<T>> mergeHalves(input: Array<T>,
                                        output: Array<T>,
                                        start: Int,
                                        mid: Int,
                                        exclusiveEnd: Int) {
        var p1 = start
        var p2 = mid
        for (i in start until exclusiveEnd) {
            if (p1 < mid && (p2 == exclusiveEnd || input[p1] <= input[p2])) {
                output[i] = input[p1]
                p1++
            } else {
                output[i] = input[p2]
                p2++
            }

        }
    }

    if (exclusiveEnd - start <= 1)
        return
    val mid = (start + exclusiveEnd) / 2
    sortSection(workB, workA, start, mid)
    sortSection(workB, workA, mid, exclusiveEnd)
    mergeHalves(workA, workB, start, mid, exclusiveEnd)
}

Code for `IntArray`

This code is nearly identical to the Array<Int> case, except for the function signatures.

/**
 * @param array The array to sort.  A copy is made, so it's content is not modified.
 */
fun mergeSortIA(array: IntArray) : IntArray {
    val arrayCopy = array.copyOf()
    val arrayCopy2 = array.copyOf()

    sortSectionIA(arrayCopy, arrayCopy2, 0, arrayCopy.size)

    return arrayCopy2
}

/**
 * Sorts the specified section of the array.
 *
 * @param workA Should contain identical values as workB in the specified range.
 *              The final values in the specified range are destroyed (actually they are made
 *              of two adjacent sorted ranged).
 * @param workB Should contain identical values as workA in the specified range.
 *              The final values in the specified range are the sorted values in that range.
 */
internal inline fun mergeHalvesIA(workA: IntArray,
                                  workB: IntArray,
                                  start: Int,
                                  mid: Int,
                                  exclusiveEnd: Int) {
    var p1 = start
    var p2 = mid
    for (i in start until exclusiveEnd) {
        if (p1 < mid && (p2 == exclusiveEnd || workA[p1] <= workA[p2])) {
            workB[i] = workA[p1]
            p1++
        } else {
            workB[i] = workA[p2]
            p2++
        }
    }
}

internal fun sortSectionIA(input: IntArray,
                           output: IntArray,
                           start: Int,
                           exclusiveEnd: Int) : Unit {

    if (exclusiveEnd - start <= 1)
        return
    val mid = (start + exclusiveEnd) / 2
    sortSectionIA(output, input, start, mid)
    sortSectionIA(output, input, mid, exclusiveEnd)
    mergeHalvesIA(input, output, start, mid, exclusiveEnd)
}

Code with Coroutines (Multi-Threaded)

Kotlin coroutines are a way to write non-blocking multi-threaded code. It is mostly useful for code that blocks on IO, so there is not much advantage in using it here instead of standard Java concurrency, but I wanted to try it. Coroutines are not yet officially part of the language, as of August 2017. There is a nice tutorial.

For the multi-threaded implementation of merge sort, instead of using two work arrays, copies of the arrays are made. But when the sub-arrays are 128 items long or smaller, the algorithm switches to the single threaded version with two work arrays. This avoids the cost of creating many small arrays.

import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.runBlocking
import kotlin.coroutines.experimental.CoroutineContext

// Somewhat dirty to make the thread pool publicly modifiable, but it is useful
// for benchmarking.
internal var corouContext: CoroutineContext = CommonPool

/**
 * @param array The array to sort.  A copy is made, so it's content is not destroyed.
 * @param minThreshold When the array has a size smaller than this threshold, it is sorted using
 *    a non-threaded version of merge-sort.  A parallel version of merge-sort for small
 *    arrays would not be efficient because too many small arrays would be created.
 */
fun mergeSortCorou(array: IntArray, minThreshold: Int = 128) : IntArray {
    val arrayCopy = array.copyOf()
    return runBlocking<IntArray> {
        val deferred = async(corouContext) { mergeSortRecur(arrayCopy, minThreshold) }
        deferred.await()
    }
}

/**
 * @param array The input. Also used to store the output.
 * @param minThreshold When the array has a size smaller than this, it is sorted using
 *    a non-threaded version of merge-sort.  A parallel version of merge-sort for small
 *    lists would not be efficient.
 */
// Note: a `suspend` function cannot be an inner function.
internal suspend fun mergeSortRecur(array: IntArray, minThreshold: Int) : IntArray {
    if (array.size <= minThreshold) {
        return mergeSortIAFast(array)
    }
    val mid = array.size / 2
    val half1 = array.sliceArray(0 until mid)
    val half2 = array.sliceArray(mid until array.size)
    val half1Defer = async(corouContext) { mergeSortRecur(half1, minThreshold) }
    val half2Defer = async(corouContext) { mergeSortRecur(half2, minThreshold) }

    mergeSeparated(half1Defer.await(), half2Defer.await(), output = array)
    return array
}

/**
 * @param array The input; also used as a work array, so its content is destroyed.
 */
internal fun mergeSortIAFast(array: IntArray) : IntArray {
    val arrayCopy = array.copyOf()
    sortSectionIA(array, arrayCopy, 0, array.size)
    return arrayCopy
}

/**
 * Merge two sorted array in a single sorted array.
 *
 * @param half1 Sorted array.
 * @param half2 Sorted array.
 * @param output Output array of size `half1.size + half2.size`.  Its original content will be erased.
 */
internal fun mergeSeparated(half1: IntArray, half2: IntArray, output: IntArray) {
    require(half1.size + half2.size == output.size)
    var p1 = 0
    var p2 = 0
    for (i in 0 until half1.size + half2.size) {
        if (p1 < half1.size && (p2 == half2.size || half1[p1] <= half2[p2])) {
            output[i] = half1[p1]
            p1++
        } else {
            output[i] = half2[p2]
            p2++
        }
    }
}

Java Code

This should be the same implementation as with IntArray in Kotlin above, using basic merge sort with two work arrays.

import java.util.Arrays;

public class MergeSorter {

    /**
     *
     * @param array The array to sort.  A copy is made so the array itself is not modified.
     */
    public static int[] sort(int[] array) {
        int[] workA =  Arrays.copyOf(array, array.length);
        int[] workB =  Arrays.copyOf(array, array.length);

        sortPart(workA, workB, 0, array.length);

        return workB;
    }

    /**
     * Sorts the specified section of the array.
     *
     * @param workA Should contain identical values as workB in the specified range.
     *              The final values in the specified range are destroyed (actually they are made
     *              of two adjacent sorted ranged).
     * @param workB Should contain identical values as workA in the specified range.
     *              The final values in the specified range are the sorted values in that range.
     */
    private static void sortPart(int[] workA, int[] workB, int start, int exclusiveEnd) {
        if (start >= exclusiveEnd - 1) {
            return;
        } else {
            int mid = (start + exclusiveEnd) / 2;
            sortPart(workB, workA, start, mid);
            sortPart(workB, workA, mid, exclusiveEnd);
            mergeParts(workA, start, mid, exclusiveEnd, workB);
            return;
        }
    }

    /**
     *
     * @param input The array which is really made of sub-array1 from start to (mid - 1) and sub-array2
     *              from mid to (exclusiveEnd - 1), which will be merged.  Those two sub-arrays should
     *              already be sorted.  This array will not be modified.
     * @param output Should have the same length as array.  The section from start to (exclusiveEnd - 1)
     *               will contain the merged two sub-arrays in array.  The values outside this range
     *               are left unchanged.
     */
    private static void mergeParts(int[] input, int start, int mid, int exclusiveEnd, int[] output) {
        int pointer1 = start;
        int pointer2 = mid;
        for (int kOut = start; kOut < exclusiveEnd; kOut++) {
            if (pointer1 != mid && (pointer2 == exclusiveEnd || (input[pointer1] <= input[pointer2]))) {
                output[kOut] = input[pointer1];
                pointer1++;
            } else {
                output[kOut] = input[pointer2];
                pointer2++;
            }
        }
    }
}

Attempt at a Common Interface (Extra Indirection Layer)

This is my attempt at creating a common interface for IntArray and Array<Int>.

/**
 * @param array The array to sort.  A copy is made, so it's content is not modified.
 */
fun <T: Comparable<T>> mergeSortLL(array: ComparableListLike<T>) : ComparableListLike<T> {
    val arrayCopy = array.copyOf()
    val arrayCopy2 = array.copyOf()

    sortSectionLL(arrayCopy, arrayCopy2, 0, arrayCopy.size)

    return arrayCopy2
}

/**
 * Sorts the specified section of the array.
 *
 * @param workA Should contain identical values as workB in the specified range.
 *              The final values in the specified range are destroyed (actually they are made
 *              of two adjacent sorted ranged).
 * @param workB Should contain identical values as workA in the specified range.
 *              The final values in the specified range are the sorted values in that range.
 */
internal fun <T: Comparable<T>> sortSectionLL(workA: ComparableListLike<T>,
                                              workB: ComparableListLike<T>,
                                              start: Int,
                                              exclusiveEnd: Int) : Unit {

    if (exclusiveEnd - start <= 1)
        return
    val mid = (start + exclusiveEnd) / 2
    sortSectionLL(workB, workA, start, mid)
    sortSectionLL(workB, workA, mid, exclusiveEnd)
    mergeHalvesLL(workA, workB, start, mid, exclusiveEnd)
}

internal inline fun <T: Comparable<T>> mergeHalvesLL(input: ComparableListLike<T>,
                                                     output: ComparableListLike<T>,
                                                     start: Int,
                                                     mid: Int,
                                                     exclusiveEnd: Int) {
    var p1 = start
    var p2 = mid
    val end = exclusiveEnd
    for (i in start until exclusiveEnd) {
        if (p1 < mid && (p2 == end || input[p1] <= input[p2])) {
            output[i] = input[p1]
            p1++
        } else {
            output[i] = input[p2]
            p2++
        }
    }
}

interface ComparableListLike<T: Comparable<T>> {
    val size: Int
    operator fun get(i: Int): T
    operator fun set(i: Int, t: T): Unit
    fun copyOf(): ComparableListLike<T>
    fun toList(): List<T>
}

// Can't be part of the interface and be `inline`
inline fun <T: Comparable<T>> ComparableListLike<T>.swap(i: Int, j: Int): Unit {
    val tmp = get(i)
    set(i, get(j))
    set(j, tmp)
    return
}


class SortingArray<T: Comparable<T>>(val array: Array<T>): ComparableListLike<T> {
    override val size: Int
        inline get() = array.size
    override inline operator fun get(i: Int) = array[i]
    override inline operator fun set(i: Int, t: T): Unit {
        array[i] = t
        return
    }
    override inline fun copyOf() = SortingArray(array.copyOf())
    override inline fun toList() = array.toList()
}

class SortingList<T: Comparable<T>>(val array: MutableList<T>): ComparableListLike<T> {
    override val size: Int
        inline get() = array.size
    override inline operator fun get(i: Int) = array[i]
    override inline operator fun set(i: Int, t: T): Unit {
        array[i] = t
        return
    }
    override inline fun copyOf() = SortingList(array.toMutableList()) // toMutableList() makes a copy
    override inline fun toList() = array.toList()
}

class SortingFromFuns<T: Comparable<T>>(val sizee: () -> Int,
                                        val gett: (i:Int) -> T,
                                        val sett: (i: Int, t: T) -> Unit,
                                        val copyOff: () -> ComparableListLike<T>,
                                        val toListt: () -> List<T>): ComparableListLike<T> {
    override val size: Int
        inline get() = sizee()
    override inline operator fun get(i: Int) = gett(i)
    override inline operator fun set(i: Int, t: T) = sett(i, t)
    override inline fun copyOf() = copyOff()
    override inline fun toList() = toListt()
}

fun IntArray.asComparableList(): ComparableListLike<Int> {
    return SortingFromFuns(
            sizee = { this.size },
            gett = this::get,
            sett = this::set,
            copyOff = this::asComparableList,
            toListt = this::toList
    )
}

fun DoubleArray.asComparableList(): ComparableListLike<Double> {
    return SortingFromFuns(
            sizee = { this.size },
            gett = this::get,
            sett = this::set,
            copyOff = this::asComparableList,
            toListt = this::toList
    )
}

// TODO: ByteArray, ShortArray, etc.

Utilities: Benchmarking and Test Code

Benchmarking code:

    import kotlinx.coroutines.experimental.CommonPool
    import kotlinx.coroutines.experimental.newSingleThreadContext
    import java.util.*
    import kotlin.system.measureNanoTime

    internal val random = Random()

    fun createRandArray(n: Long): Array<Int> {
        return createRandTArray(n).toTypedArray()
    }

    fun createRandTArray(n: Long): IntArray {
        return random.ints(n, 0, n.toInt()).toArray()
    }

    var size: Long = 1_000_000
    var nRepeats: Int = 11
    var nDrops: Int = 1

    fun <T> benchmark(name: String,
                      buildSortable: (Long) -> T,
                      sort: (T) -> Any) {
        val arrays = List(nRepeats) { buildSortable(size) }
        val timesMS = arrays.map { measureNanoTime { sort(it) } / 1_000_000 }.drop(nDrops) // for JVM optimization warmup
        // println(timesMS)
        println("[$name] Average sort time for array of size $size: ${timesMS.average() } ms")
    }

    fun main(args: Array<String>) {
        size = 1_000_000
        nRepeats = 6

        benchmark("Array<Int>",
                buildSortable = { size -> createRandTArray(size).toTypedArray() },
                sort = ::mergeSort)

        benchmark("ListLike Array<Int>",
                buildSortable = { size -> SortingArray(createRandTArray(size).toTypedArray()) },
                sort = { array -> mergeSortLL(array) })  // oddly ::mergeSortLL is refused

        benchmark("ListLike IntArray",
                buildSortable = { size -> createRandTArray(size).asComparableList() },
                sort = { array -> mergeSortLL(array) })  // oddly ::mergeSortLL is refused

        benchmark("IntArray",
                buildSortable = { size -> createRandTArray(size) },
                sort = ::mergeSortIA)

        benchmark("IntArray multi-thread (CommonPool) with many array copies",
                buildSortable = { size -> createRandTArray(size) },
                sort = { mergeSortCorou(it) })

        val origContext = corouContext
        corouContext = newSingleThreadContext("single thread")
        benchmark("IntArray multi-thread (one thread!) with many array copies",
                buildSortable = { size -> createRandTArray(size) },
                sort = { mergeSortCorou(it) })
        corouContext = origContext

        benchmark("Java int[]",
                buildSortable = { size -> createRandTArray(size) },
                sort = MergeSorter::sort)

        benchmark("Arrays.sort",
                buildSortable = { size -> createRandTArray(size) },
                sort = Arrays::sort)
    }

Tests with JUnit5:

import kotlinx.coroutines.experimental.CommonPool
import org.junit.jupiter.api.Assertions.*
import org.junit.jupiter.api.Test
import java.util.*

class MergeSortTest {
    companion object {
        val intarray: IntArray = Random().ints(1000, 0, 800).toArray()
        val array: Array<Int> = intarray.toTypedArray()

        val sortedArray = array.copyOf().apply { Arrays.sort(this) }
    }

    @Test
    fun testMergeSortJava() {
        assertArrayEquals(MergeSorter.sort(intarray), sortedArray.toIntArray())
    }

    @Test
    fun testMergeSort() {
        assertArrayEquals(mergeSort(array), sortedArray)
    }

    @Test
    fun testMergeSortIA() {
        assertArrayEquals(mergeSortIA(intarray).toTypedArray(), sortedArray)
    }

    @Test
    fun testMergeSortCorou() {
        assertArrayEquals(mergeSortCorou(intarray), sortedArray.toIntArray())
    }
}

max333/KotlinMergeSort.md