Last active
August 29, 2015 14:24
-
-
Save keeganbrown/8dadc03ca2758ad27ad0 to your computer and use it in GitHub Desktop.
Shameless Rip from here: http://keithschwarz.com/interesting/code/?dir=factoradic-permutation, thought of in context as a solution for: http://www.codewars.com/kata/alphabetic-anagrams
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/****************************************************************************** | |
* File: FactoradicPermutation.hh | |
* Author: Keith Schwarz ([email protected]) | |
* | |
* An algorithm for manipulating and generating permutations in lexicographic | |
* order by using the factoradic number system. | |
* | |
* The factoradic number system (also called the factorial number system) is a | |
* way of writing out numbers in a base system that depends on factorials, | |
* rather than powers of numbers. In the factoradic system, the last digit is | |
* always 0 and is in base 0!. The digit before that is either 0 or 1 and is | |
* in base 1!. The digit before that is either 0, 1, or 2 and is in base 2!. | |
* More generally, the nth-to-last digit in always 0, 1, 2, ..., or n and is in | |
* base n!. When writing a number, we usually append a subscript ! to indicate | |
* that it's written in factoradic base. In this (plain-text) C++ source file, | |
* I'll denote this by suffixing the number with _!. | |
* | |
* As an example, the factoradic number 2110_! is equal to | |
* | |
* 2 x 3! + 1 x 2! + 1 x 1! + 0 x 0! | |
* = 2 x 6 + 1 x 2 + 1 x 1 + 0 x 1 | |
* = 12 + 2 + 1 + 0 | |
* = 15 | |
* | |
* Similarly, the first few factoradic numbers are | |
* | |
* 0_! 10_! 100_! 110_! 200_! 210_! 1000_! 1010_! 1100_! 1110_! | |
* | |
* One reason that the factoradic numbers are so useful is that there's a close | |
* connection between factoradic numbers and permutations of distinct values. | |
* To understand where this comes from, suppose that we're interested in | |
* generating all of the permutations of some collection in lexicographic | |
* order. One way to do this might be to think about generating the first such | |
* permutation, then the second, then the third, etc. In other words, we want | |
* to look for some way of mapping the integers onto permutations in a way that | |
* allows us to efficiently answer the question "what is the nth permutation of | |
* this set?" | |
* | |
* Let's consider permutations of four elements - a, b, c, and d - and the | |
* order in which they're generated. Here's a table of these permutations: | |
* | |
* 0 abcd 6 bacd 12 cabd 18 dabc | |
* 1 abdc 7 badc 13 cadb 19 dacb | |
* 2 acbd 8 bcad 14 cbad 20 dbac | |
* 3 acdb 9 bcda 15 cbda 21 dbca | |
* 4 adbc 10 bdac 16 cdab 22 dcab | |
* 5 adcb 11 bdca 17 cdba 23 dcba | |
* | |
* Initially, there might not seem to be any pattern relating the numbers to | |
* the permutations, but there are a few trends we can spot here. For example, | |
* every sixth permutation starts with a new letter, and 6 = 3!. The second | |
* letter of each permutation changes every two permutations, and 2 = 2!. | |
* | |
* If we rewrite the above table with the integers in factoradic base, then a | |
* more obvious trend starts to appear: | |
* | |
* 0000_! abcd 1000_! bacd 2000_! cabd 3000_! dabc | |
* 0010_! abdc 1010_! badc 2010_! cadb 3010_! dacb | |
* 0100_! acbd 1100_! bcad 2100_! cbad 3100_! dbac | |
* 0110_! acdb 1110_! bcda 2110_! cbda 3110_! dbca | |
* 0200_! adbc 1200_! bdac 2200_! cdab 3200_! dcab | |
* 0210_! adcb 1210_! bdca 2210_! cdba 3210_! dcba | |
* | |
* We noted above that the first letter changes every six permutations, and now | |
* we can see that this coincides with the times at which the first digit of | |
* the factoradic representation changes. Similarly, when we noted that the | |
* second letter of the permutation changed every two permutations, we can see | |
* this is occurring whenever the second digit of the factoradic representation | |
* of the index is changing. | |
* | |
* The connection here is due to a construction called "Lehmer codes," a way of | |
* labeling permutations in a way that also allows their construction. The | |
* idea behind Lehmer codes is as follows. Suppose that you are given a | |
* permutation of a totally-ordered set of elements, such as the permutation | |
* BEDAC of the letters A-E. We start off by looking at the first letter of | |
* this permutation - in this case, B - and seeing how many elements it's | |
* bigger than (in this case, 1). We then split this letter off of the | |
* permutation and record the fact that it was bigger than one element: | |
* | |
* B EDAC | |
* 1 | |
* | |
* Now, we repeat this process on EDAC. Here, E is larger than three of the | |
* elements of EDAC (everything except itself), so we write out a three to get | |
* | |
* BE DAC | |
* 13 | |
* | |
* Repeating this on D gives us | |
* | |
* BED AC | |
* 132 | |
* | |
* And on A gives us | |
* | |
* BEDA C | |
* 1320 0 | |
* | |
* And finally on the last C gives us | |
* | |
* BEDAC | |
* 13200 | |
* | |
* So the Lehmer code for this permutation is (1, 3, 2, 0, 0). Note that the | |
* length of a Lehmer code is equal to the number of elements in the | |
* permutation. | |
* | |
* Given a permutation's Lehmer code, we can generate the permutation it stands | |
* for by running this process backwards. Suppose, for example, that we're | |
* given the Lehmer code (3, 1, 0, 0) and want to use it to recover some given | |
* permutation of ABCD. To do this, we'd start off by taking the element of | |
* this collection of letters that's bigger than three other elements. This is | |
* equivalent to asking which element is at position 3 using zero-indexing, and | |
* gives us D. So now we have | |
* | |
* D ABC | |
* 3 | |
* | |
* and the remaining Lehmer code (1, 0, 0). Looking at the 1 in the first | |
* position of this Lehmer code, we should take the element out of the unused | |
* elements at position 1 (which, zero-indexed, is B), giving us | |
* | |
* DB AC | |
* 31 | |
* | |
* The rest of our code is (0, 0), so we take the zeroth-largest element of the | |
* unused letters (A) and put it next: | |
* | |
* DBA C | |
* 310 | |
* | |
* And finally, we use the last part of our code, (0), to select the last | |
* element of the permutation, which is C: | |
* | |
* DBAC | |
* 3100 | |
* | |
* The clincher in all this is that there's a simple, one-to-one mapping | |
* between the factoradic numbers and Lehmer codes. The idea is simple - given | |
* a Lehmer code (d0, d1, ..., dn), we just pick the number | |
* | |
* d0 d1 d2 ... dn_! | |
* | |
* For example, given Lehmer code (3, 1, 0, 0), we'd pick the factoradic number | |
* 3100_!. Similarly, given factoradic number 210_!, we'd pick the Lehmer code | |
* (2, 1, 0). | |
* | |
* In order to convince ourselves that this is even mathematically well-defined | |
* we need to see that each of the digits we assign are within the range | |
* specified for that digit. For example, if it were possible that we could | |
* have a Lehmer code (1, 4), then we couldn't convert it to the factoradic | |
* number 14_!, since the last digit of any factoradic number must be 0. This | |
* is not particularly difficult to see. Remember that the digits of the | |
* Lehmer code are zero-based indices into the list of elements we haven't yet | |
* used yet. Thus the last digit must be zero, since there's only one element | |
* remaining; the penultimate digit can be either one or zero since there's | |
* two possible choices; the antepenultimate digit can be zero, one or two; | |
* etc. | |
* | |
* Given the bijiective correspondence between Lehmer codes and the factoradic | |
* numbers, we have a way of taking a natural number N and mapping it into a | |
* permutation of some number of elements. However, as of now we have no | |
* reason to believe that this will list every permutation in lexicographic | |
* order. After all, given some random encoding system for permutations, why | |
* should we expect anything special out of our ordering? Fortunately, though | |
* we have the following lemma: | |
* | |
* Lemma: Given two permutations p_1 and p_2 with Lehmer codes L_1 and L_2, | |
* p_1 lexicographically precedes p_2 iff L_1 lexicographically precedes L_2. | |
* | |
* Corollary: Every permutation has a unique Lehmer code. | |
* | |
* Before we go into the proof, I should remark that this guarantees that the | |
* following algorithm will list all permutations of n elements in order: | |
* | |
* For i = 0 to n! - 1: | |
* Represent i in factoradic base. | |
* Convert this representation to a Lehmer code. | |
* Generate the permutation based on this Lehmer code. | |
* | |
* This algorithm works correctly because every permutation has a Lehmer code, | |
* and by iterating over all n! different codes we'll hit every permutation. | |
* Moreover, our lemma guarantees that no two permutations have the same | |
* Lehmer code, we know that we'll get every permutation in order and exactly | |
* once. | |
* | |
* Proof of lemma: We first prove that if p_1 < p_2, then L_1 < L_2. Given the | |
* two permutations, consider the first elements at which the permutations | |
* disagree, call this index i. Note that i can't be the last element of the | |
* permutations, because that would mean that the permutations agree in every | |
* position except the last, and since the last element doesn't match it means | |
* that the permutations aren't of the same elements. Since the permutations | |
* agree on the first i - 1 elements, the Lehmer codes for the first i - 1 | |
* elements must agree. This means that at the time the ith digit of the | |
* Lehmer code is being chosen, in both permutations the same set of leftover | |
* elements will remain. Since p_1 < p_2, the ith element of p_1 must be | |
* smaller than the ith element of p_2, and so the index of that element in | |
* what's left must be smaller than the index of the corresponding element from | |
* p_2 in what's left. Consequently, the digit chosen for L_1 will be less | |
* than the digit for L_2, and since the codes agree on their prefix before | |
* this we have that L_1 < L_2. | |
* | |
* To prove the opposite direction, suppose that L_1 < L_2 and find the first | |
* spot at which they disagree; say it's index i. i cannot be the last spot | |
* in the code, since the very last digit must be a 0, so it has to be some | |
* place before that. Since the prefixes of the codes match up to that point, | |
* at the spot where L_1 chooses a lower number than L_2, the partial | |
* permutations built up so far must match and the remaining elements must be | |
* the same. Consequently, when L_1 has a lower value than L_2, the resulting | |
* permutation will have a smaller value in p_1 than in p_2, and since the | |
* prefixes match the result will be that p_1 < p_2. QED. | |
* | |
* The functions exported by this file allow for the generation and | |
* manipulation of permutations using factoradic number representations. They | |
* run in quadratic time, though faster implementations are possible using more | |
* complex algorithms. | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment