Skip to content

Instantly share code, notes, and snippets.

@yarko
Last active August 29, 2015 14:16
Show Gist options
  • Save yarko/14a23de842181f1f12ac to your computer and use it in GitHub Desktop.
Save yarko/14a23de842181f1f12ac to your computer and use it in GitHub Desktop.
Timsort Fix, partially clarified

Recently, I saw proponents of automated tools show a bug found in the timsort hybrid sorting algorithm.

It's been widely used, and ported, so this is with some pride that proponents of a particular approach gloat, with cause.

http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/#sec3

But several things stand out. The solution is what SEI would have named "low maturity" (not quite - the article did go through it elsewhere):

  • it copy / paste's what was and what is, leaving it to the reader to discover what changes were made;
  • it uses what I have always found an harder to read coding standard

I've reformatted the C sources (below) to suite my 35 year long preference, what helps my code reading easier, quicker. So that you understand what about this (my) style suites me, see how containing braces are at matching indents, easy to spot. Code belonging to a control block (e.g. conditional, or loop) appears on the same line as the opening brace, so that it is closest to what it is most intimate with, to which it naturally belongs (a visual affinity), with the closing brace creating space (separation) from the next phrase, so they are appropriately differentiated.

The simple diff of before vs. after is this:

<         if (n > 0 && p[n-1].len <= p[n].len + p[n+1].len)
---
>         if ( (n > 0   && p[n-1].len <= p[n].len + p[n+1].len)
>           || (n-1 > 0 && p[n-2].len <= p[n].len + p[n-1].len) )

I wished the article showed the small diff in its discussion.

For those who haven't memorized that "||" is of the lowest precedence here, you will be able to read this and not wonder if it's a problem.

It's simply easier to read with the additional paranthesis.

I leave it as an exercise to comment the code, to improve its literacy.

/* The maximum number of entries in a MergeState's
* * pending-runs stack.
* * This is enough to sort arrays of size up to about
* * 32 * phi ** MAX_MERGE_PENDING
* * where phi ~= 1.618. 85 is ridiculously large enough,
* * good for an array with 2**64 elements.
* */
#define MAX_MERGE_PENDING 85
merge_collapse(MergeState *ms)
{
struct s_slice *p = ms->pending;
assert(ms);
while (ms->n > 1)
{ Py_ssize_t n = ms->n - 2;
if ( (n > 0 && p[n-1].len <= p[n].len + p[n+1].len)
|| (n-1 > 0 && p[n-2].len <= p[n].len + p[n-1].len) )
{ if (p[n-1].len < p[n+1].len)
--n;
if (merge_at(ms, n) < 0)
return -1;
}
else if (p[n].len <= p[n+1].len)
{ if (merge_at(ms, n) < 0)
return -1;
}
else
break;
}
return 0;
}
/* The maximum number of entries in a MergeState's
* * pending-runs stack.
* * This is enough to sort arrays of size up to about
* * 32 * phi ** MAX_MERGE_PENDING
* * where phi ~= 1.618. 85 is ridiculously large enough,
* * good for an array with 2**64 elements.
* */
#define MAX_MERGE_PENDING 85
merge_collapse(MergeState *ms)
{
struct s_slice *p = ms->pending;
assert(ms);
while (ms->n > 1)
{ Py_ssize_t n = ms->n - 2;
if (n > 0 && p[n-1].len <= p[n].len + p[n+1].len)
{ if (p[n-1].len < p[n+1].len)
--n;
if (merge_at(ms, n) < 0)
return -1;
}
else if (p[n].len <= p[n+1].len)
{ if (merge_at(ms, n) < 0)
return -1;
}
else
break;
}
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment