I was answering a post on codereview.stackexchange.com which led me to explore the performance penalty of using Array<Int>
(Java Integer[]
) instead of IntArray
(Java int[]
).
IntArray
is expected to have better performance because it avoids the indirection of boxing/unboxing. But for code reuse purposes, Array<Int>
is preferable since the algorithm can then be coded once for Array<T: Comparable<T>>
and used with any comparable type.
I benchmarked the two approaches and along the way I also compared them to a multi-threaded merge-sort algorithm and merge-sort written in Java. For the multi-threaded version I used Kotlin coroutines.
It is frustrating to code the merge-sort algorithm for IntArray
and Array<Int>
separately since they are identical, except for the type declarations. So I then tried to solve this issue by applying the dictum that "in computer science there is nothing that can't be solved by adding another layer of indirection". I created a common interface and coded a single merge sort algorithm to that interface. Unfortunately there is a significant performance loss from the extra indirection, even when inlining as much as possible.
I pasted the code for all those algorithms below, followed by the testing and benchmarking code.
The results below are the average sorting time for an array of 1,000,000 integers. Different implementations of the merge sort algorithm are compared, except for java.util.Arrays.sort
which is an implementation "dual pivot quick sort".
As with all benchmarking, those results are too be taken lightly. For example, the relative strength of the different implementations can change significantly depending on the length of the array to sort.
Algorithm Implementation | Sort Time (ms) |
---|---|
Array<Int> |
200 |
IntArray |
90 |
IntArray multi-threaded (7 threads) |
60 |
IntArray multi-threaded (1 thread) |
120 |
Pure Java int[] |
115 |
java.util.Arrays.sort |
60 |
IntArray via common interface |
380 |
Array<Int> via common interface |
390 |
The main result is that IntArray
is twice as fast as Array<Int>
which is not surprising since there is no indirection cost from boxing/unboxing.
For the multi-threaded implementation of merge sort, the algorithm is somewhat different: instead of alternating between two work arrays, the parallel implementation creates new arrays at each iteration. It is not surprising that the multi-threaded implementation running on a single thread is slower than merge sort implemented with two work arrays since there is a cost to creating all those arrays. To avoid creating too many small arrays, the multi-threaded implementation calls the non-multi-threaded algorithm when the arrays are smaller than 128 items. The performance would have been much worse otherwise. This was tested on a computer with 4 cores / 8 hyperthreads. The CommonPool
of Kotlin coroutines uses a ForkJoinPool
with they number of threads being the number of available threads on the system minus one.
It seems that implementing merge sort in Java is somewhat slower than Kotlin, but maybe there is a slight difference in how I coded them.
Finally, as mentioned at the beginning, I tried to introduce a common interface so that I could code merge sort only once for both IntArray
and Array<Int>
. This adds a layer of indirection and the performance is very disappointing. I tried to use inline
as much as possible. I was informed on reddit that the compiler can completely remove the indirection in some similar cases, but it seems to fail here. I have not spent much time trying to make it work, so it might still be possible to have the cost of indirection reduced or removed.
Standard merge sort algorithm using two work arrays, see wikipedia.
/**
* @param array The array to sort. A copy is made, so it's content is not modified.
*/
fun <T : Comparable<T>> mergeSort(array: Array<T>) : Array<T> {
val workA = array.copyOf()
val workB = array.copyOf()
sortSection(workA, workB, 0, workA.size)
return workB
}
/**
* Sorts the specified section of the array.
*
* @param workA Should contain identical values as workB in the specified range.
* The final values in the specified range are destroyed (actually they are made
* of two adjacent sorted ranged).
* @param workB Should contain identical values as workA in the specified range.
* The final values in the specified range are the sorted values in that range.
*/
internal fun <T : Comparable<T>> sortSection(workA: Array<T>,
workB:Array<T>,
start: Int,
exclusiveEnd: Int) : Unit {
fun <T : Comparable<T>> mergeHalves(input: Array<T>,
output: Array<T>,
start: Int,
mid: Int,
exclusiveEnd: Int) {
var p1 = start
var p2 = mid
for (i in start until exclusiveEnd) {
if (p1 < mid && (p2 == exclusiveEnd || input[p1] <= input[p2])) {
output[i] = input[p1]
p1++
} else {
output[i] = input[p2]
p2++
}
}
}
if (exclusiveEnd - start <= 1)
return
val mid = (start + exclusiveEnd) / 2
sortSection(workB, workA, start, mid)
sortSection(workB, workA, mid, exclusiveEnd)
mergeHalves(workA, workB, start, mid, exclusiveEnd)
}
This code is nearly identical to the Array<Int>
case, except for the function signatures.
/**
* @param array The array to sort. A copy is made, so it's content is not modified.
*/
fun mergeSortIA(array: IntArray) : IntArray {
val arrayCopy = array.copyOf()
val arrayCopy2 = array.copyOf()
sortSectionIA(arrayCopy, arrayCopy2, 0, arrayCopy.size)
return arrayCopy2
}
/**
* Sorts the specified section of the array.
*
* @param workA Should contain identical values as workB in the specified range.
* The final values in the specified range are destroyed (actually they are made
* of two adjacent sorted ranged).
* @param workB Should contain identical values as workA in the specified range.
* The final values in the specified range are the sorted values in that range.
*/
internal inline fun mergeHalvesIA(workA: IntArray,
workB: IntArray,
start: Int,
mid: Int,
exclusiveEnd: Int) {
var p1 = start
var p2 = mid
for (i in start until exclusiveEnd) {
if (p1 < mid && (p2 == exclusiveEnd || workA[p1] <= workA[p2])) {
workB[i] = workA[p1]
p1++
} else {
workB[i] = workA[p2]
p2++
}
}
}
internal fun sortSectionIA(input: IntArray,
output: IntArray,
start: Int,
exclusiveEnd: Int) : Unit {
if (exclusiveEnd - start <= 1)
return
val mid = (start + exclusiveEnd) / 2
sortSectionIA(output, input, start, mid)
sortSectionIA(output, input, mid, exclusiveEnd)
mergeHalvesIA(input, output, start, mid, exclusiveEnd)
}
Kotlin coroutines are a way to write non-blocking multi-threaded code. It is mostly useful for code that blocks on IO, so there is not much advantage in using it here instead of standard Java concurrency, but I wanted to try it. Coroutines are not yet officially part of the language, as of August 2017. There is a nice tutorial.
For the multi-threaded implementation of merge sort, instead of using two work arrays, copies of the arrays are made. But when the sub-arrays are 128 items long or smaller, the algorithm switches to the single threaded version with two work arrays. This avoids the cost of creating many small arrays.
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.runBlocking
import kotlin.coroutines.experimental.CoroutineContext
// Somewhat dirty to make the thread pool publicly modifiable, but it is useful
// for benchmarking.
internal var corouContext: CoroutineContext = CommonPool
/**
* @param array The array to sort. A copy is made, so it's content is not destroyed.
* @param minThreshold When the array has a size smaller than this threshold, it is sorted using
* a non-threaded version of merge-sort. A parallel version of merge-sort for small
* arrays would not be efficient because too many small arrays would be created.
*/
fun mergeSortCorou(array: IntArray, minThreshold: Int = 128) : IntArray {
val arrayCopy = array.copyOf()
return runBlocking<IntArray> {
val deferred = async(corouContext) { mergeSortRecur(arrayCopy, minThreshold) }
deferred.await()
}
}
/**
* @param array The input. Also used to store the output.
* @param minThreshold When the array has a size smaller than this, it is sorted using
* a non-threaded version of merge-sort. A parallel version of merge-sort for small
* lists would not be efficient.
*/
// Note: a `suspend` function cannot be an inner function.
internal suspend fun mergeSortRecur(array: IntArray, minThreshold: Int) : IntArray {
if (array.size <= minThreshold) {
return mergeSortIAFast(array)
}
val mid = array.size / 2
val half1 = array.sliceArray(0 until mid)
val half2 = array.sliceArray(mid until array.size)
val half1Defer = async(corouContext) { mergeSortRecur(half1, minThreshold) }
val half2Defer = async(corouContext) { mergeSortRecur(half2, minThreshold) }
mergeSeparated(half1Defer.await(), half2Defer.await(), output = array)
return array
}
/**
* @param array The input; also used as a work array, so its content is destroyed.
*/
internal fun mergeSortIAFast(array: IntArray) : IntArray {
val arrayCopy = array.copyOf()
sortSectionIA(array, arrayCopy, 0, array.size)
return arrayCopy
}
/**
* Merge two sorted array in a single sorted array.
*
* @param half1 Sorted array.
* @param half2 Sorted array.
* @param output Output array of size `half1.size + half2.size`. Its original content will be erased.
*/
internal fun mergeSeparated(half1: IntArray, half2: IntArray, output: IntArray) {
require(half1.size + half2.size == output.size)
var p1 = 0
var p2 = 0
for (i in 0 until half1.size + half2.size) {
if (p1 < half1.size && (p2 == half2.size || half1[p1] <= half2[p2])) {
output[i] = half1[p1]
p1++
} else {
output[i] = half2[p2]
p2++
}
}
}
This should be the same implementation as with IntArray
in Kotlin above, using basic merge sort with two work arrays.
import java.util.Arrays;
public class MergeSorter {
/**
*
* @param array The array to sort. A copy is made so the array itself is not modified.
*/
public static int[] sort(int[] array) {
int[] workA = Arrays.copyOf(array, array.length);
int[] workB = Arrays.copyOf(array, array.length);
sortPart(workA, workB, 0, array.length);
return workB;
}
/**
* Sorts the specified section of the array.
*
* @param workA Should contain identical values as workB in the specified range.
* The final values in the specified range are destroyed (actually they are made
* of two adjacent sorted ranged).
* @param workB Should contain identical values as workA in the specified range.
* The final values in the specified range are the sorted values in that range.
*/
private static void sortPart(int[] workA, int[] workB, int start, int exclusiveEnd) {
if (start >= exclusiveEnd - 1) {
return;
} else {
int mid = (start + exclusiveEnd) / 2;
sortPart(workB, workA, start, mid);
sortPart(workB, workA, mid, exclusiveEnd);
mergeParts(workA, start, mid, exclusiveEnd, workB);
return;
}
}
/**
*
* @param input The array which is really made of sub-array1 from start to (mid - 1) and sub-array2
* from mid to (exclusiveEnd - 1), which will be merged. Those two sub-arrays should
* already be sorted. This array will not be modified.
* @param output Should have the same length as array. The section from start to (exclusiveEnd - 1)
* will contain the merged two sub-arrays in array. The values outside this range
* are left unchanged.
*/
private static void mergeParts(int[] input, int start, int mid, int exclusiveEnd, int[] output) {
int pointer1 = start;
int pointer2 = mid;
for (int kOut = start; kOut < exclusiveEnd; kOut++) {
if (pointer1 != mid && (pointer2 == exclusiveEnd || (input[pointer1] <= input[pointer2]))) {
output[kOut] = input[pointer1];
pointer1++;
} else {
output[kOut] = input[pointer2];
pointer2++;
}
}
}
}
This is my attempt at creating a common interface for IntArray
and Array<Int>
.
/**
* @param array The array to sort. A copy is made, so it's content is not modified.
*/
fun <T: Comparable<T>> mergeSortLL(array: ComparableListLike<T>) : ComparableListLike<T> {
val arrayCopy = array.copyOf()
val arrayCopy2 = array.copyOf()
sortSectionLL(arrayCopy, arrayCopy2, 0, arrayCopy.size)
return arrayCopy2
}
/**
* Sorts the specified section of the array.
*
* @param workA Should contain identical values as workB in the specified range.
* The final values in the specified range are destroyed (actually they are made
* of two adjacent sorted ranged).
* @param workB Should contain identical values as workA in the specified range.
* The final values in the specified range are the sorted values in that range.
*/
internal fun <T: Comparable<T>> sortSectionLL(workA: ComparableListLike<T>,
workB: ComparableListLike<T>,
start: Int,
exclusiveEnd: Int) : Unit {
if (exclusiveEnd - start <= 1)
return
val mid = (start + exclusiveEnd) / 2
sortSectionLL(workB, workA, start, mid)
sortSectionLL(workB, workA, mid, exclusiveEnd)
mergeHalvesLL(workA, workB, start, mid, exclusiveEnd)
}
internal inline fun <T: Comparable<T>> mergeHalvesLL(input: ComparableListLike<T>,
output: ComparableListLike<T>,
start: Int,
mid: Int,
exclusiveEnd: Int) {
var p1 = start
var p2 = mid
val end = exclusiveEnd
for (i in start until exclusiveEnd) {
if (p1 < mid && (p2 == end || input[p1] <= input[p2])) {
output[i] = input[p1]
p1++
} else {
output[i] = input[p2]
p2++
}
}
}
interface ComparableListLike<T: Comparable<T>> {
val size: Int
operator fun get(i: Int): T
operator fun set(i: Int, t: T): Unit
fun copyOf(): ComparableListLike<T>
fun toList(): List<T>
}
// Can't be part of the interface and be `inline`
inline fun <T: Comparable<T>> ComparableListLike<T>.swap(i: Int, j: Int): Unit {
val tmp = get(i)
set(i, get(j))
set(j, tmp)
return
}
class SortingArray<T: Comparable<T>>(val array: Array<T>): ComparableListLike<T> {
override val size: Int
inline get() = array.size
override inline operator fun get(i: Int) = array[i]
override inline operator fun set(i: Int, t: T): Unit {
array[i] = t
return
}
override inline fun copyOf() = SortingArray(array.copyOf())
override inline fun toList() = array.toList()
}
class SortingList<T: Comparable<T>>(val array: MutableList<T>): ComparableListLike<T> {
override val size: Int
inline get() = array.size
override inline operator fun get(i: Int) = array[i]
override inline operator fun set(i: Int, t: T): Unit {
array[i] = t
return
}
override inline fun copyOf() = SortingList(array.toMutableList()) // toMutableList() makes a copy
override inline fun toList() = array.toList()
}
class SortingFromFuns<T: Comparable<T>>(val sizee: () -> Int,
val gett: (i:Int) -> T,
val sett: (i: Int, t: T) -> Unit,
val copyOff: () -> ComparableListLike<T>,
val toListt: () -> List<T>): ComparableListLike<T> {
override val size: Int
inline get() = sizee()
override inline operator fun get(i: Int) = gett(i)
override inline operator fun set(i: Int, t: T) = sett(i, t)
override inline fun copyOf() = copyOff()
override inline fun toList() = toListt()
}
fun IntArray.asComparableList(): ComparableListLike<Int> {
return SortingFromFuns(
sizee = { this.size },
gett = this::get,
sett = this::set,
copyOff = this::asComparableList,
toListt = this::toList
)
}
fun DoubleArray.asComparableList(): ComparableListLike<Double> {
return SortingFromFuns(
sizee = { this.size },
gett = this::get,
sett = this::set,
copyOff = this::asComparableList,
toListt = this::toList
)
}
// TODO: ByteArray, ShortArray, etc.
Benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.newSingleThreadContext
import java.util.*
import kotlin.system.measureNanoTime
internal val random = Random()
fun createRandArray(n: Long): Array<Int> {
return createRandTArray(n).toTypedArray()
}
fun createRandTArray(n: Long): IntArray {
return random.ints(n, 0, n.toInt()).toArray()
}
var size: Long = 1_000_000
var nRepeats: Int = 11
var nDrops: Int = 1
fun <T> benchmark(name: String,
buildSortable: (Long) -> T,
sort: (T) -> Any) {
val arrays = List(nRepeats) { buildSortable(size) }
val timesMS = arrays.map { measureNanoTime { sort(it) } / 1_000_000 }.drop(nDrops) // for JVM optimization warmup
// println(timesMS)
println("[$name] Average sort time for array of size $size: ${timesMS.average() } ms")
}
fun main(args: Array<String>) {
size = 1_000_000
nRepeats = 6
benchmark("Array<Int>",
buildSortable = { size -> createRandTArray(size).toTypedArray() },
sort = ::mergeSort)
benchmark("ListLike Array<Int>",
buildSortable = { size -> SortingArray(createRandTArray(size).toTypedArray()) },
sort = { array -> mergeSortLL(array) }) // oddly ::mergeSortLL is refused
benchmark("ListLike IntArray",
buildSortable = { size -> createRandTArray(size).asComparableList() },
sort = { array -> mergeSortLL(array) }) // oddly ::mergeSortLL is refused
benchmark("IntArray",
buildSortable = { size -> createRandTArray(size) },
sort = ::mergeSortIA)
benchmark("IntArray multi-thread (CommonPool) with many array copies",
buildSortable = { size -> createRandTArray(size) },
sort = { mergeSortCorou(it) })
val origContext = corouContext
corouContext = newSingleThreadContext("single thread")
benchmark("IntArray multi-thread (one thread!) with many array copies",
buildSortable = { size -> createRandTArray(size) },
sort = { mergeSortCorou(it) })
corouContext = origContext
benchmark("Java int[]",
buildSortable = { size -> createRandTArray(size) },
sort = MergeSorter::sort)
benchmark("Arrays.sort",
buildSortable = { size -> createRandTArray(size) },
sort = Arrays::sort)
}
Tests with JUnit5:
import kotlinx.coroutines.experimental.CommonPool
import org.junit.jupiter.api.Assertions.*
import org.junit.jupiter.api.Test
import java.util.*
class MergeSortTest {
companion object {
val intarray: IntArray = Random().ints(1000, 0, 800).toArray()
val array: Array<Int> = intarray.toTypedArray()
val sortedArray = array.copyOf().apply { Arrays.sort(this) }
}
@Test
fun testMergeSortJava() {
assertArrayEquals(MergeSorter.sort(intarray), sortedArray.toIntArray())
}
@Test
fun testMergeSort() {
assertArrayEquals(mergeSort(array), sortedArray)
}
@Test
fun testMergeSortIA() {
assertArrayEquals(mergeSortIA(intarray).toTypedArray(), sortedArray)
}
@Test
fun testMergeSortCorou() {
assertArrayEquals(mergeSortCorou(intarray), sortedArray.toIntArray())
}
}
Thanks, Max, this is great research and just what I was looking for (performance of IntArray vs Array)