Skip to content

Instantly share code, notes, and snippets.

@catid
Created July 20, 2018 00:22
Show Gist options
  • Save catid/959126e3addf29d8bf0b56c42f3ca13f to your computer and use it in GitHub Desktop.
Save catid/959126e3addf29d8bf0b56c42f3ca13f to your computer and use it in GitHub Desktop.
Spinal Codes
Spinal Codes (totally metal name)
Paper: http://nms.csail.mit.edu/papers/fp049-perry.pdf
This is a bitwise error correction code that can correct bit errors anywhere in the message.
It is not MDS and the strength of the code includes the processing time allowed for decoding.
More discussion at the end.
Simple version of the algorithm:
Divide a message of n bits into chunks of k bits:
n=32
k=4
Hash maps k=4 bits to c=7 bits (for example)
Try all 2^k options for inputs (16 trials) and check which ones are
closest to the received 7 bits.
Remember the best few options, B.
Note that the hash contains bits from the prior bits, so if a failure
in decoding occurs it corrupts all remaining bits, but also it allows
later words to provide info about earlier words.
Perhaps the final word has a larger c to reduce overall error rate? Anyway:
The smallest error sum across several iterations (D) in a row indicates success.
Trade-off between processing and error decoding ability:
We can increase B to increase the fan-out of the graph.
We can increase D to increase the depth of the graph.
Potential software version of the algorithm:
SIMD register is 128 bits wide (16 bytes).
Initialize the register to 0, 1, 2, ..., 14, 15.
We could use SIMD operations to parallelize k=4 by generating c=8
values for comparison and truncating c down to the number of bits
actually sent.
The hash function implemented in this way needs to mix two things: (1)
A large state buffer, maybe another 128-bit register. (2) Each of the
16 possible values. The larger state buffer mixes in the prior values
in the data. For example it could be a 128-bit hash of the message so
far.
The result of the hash can be masked down to the number of bits
desired for comparison.
Broadcast to a comparison register the actual value byte A.
For example if we received a 7-bit value 1f, then the comparison
register gets 1f, 1f, ..., 1f, 1f.
Then we would need to do Hamming distance comparison for each byte in parallel.
PSHUFB can do this in a few operations as in:
https://github.com/CMU-SAFARI/Shifted-Hamming-Distance/blob/master/ssse3_popcount.c
We would want to repeat this operation for B of the best last options
for the prior 4 bits.
SIMD sorting network:
https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf
This sorting network appears to be the largest cost of guess-and-check.
We take these best say B=4 values and run the above procedure for each
of them on the next 4 bits.
The sum of the Hamming distance of prior bits and next bits together
is the overall score.
The smallest overall score path is selected as the one true solution D
steps behind the current step (crystallizes).
Discussion:
This algorithm runs in linear time so asymptotically faster than Reed
Solomon forward error correction codes,
but in trade it is not MDS. It's possible for packet-sized data this
is actually competitive for speed for this reason.
The most exciting thing about this is that it might enable fast
software FEC codes.
It requires an amount of additional trailing data encoded at a higher
rate to "cap off" the solution.
It would make sense to use e.g. a 16-bit code for the last 4 bits.
Within the encoded data it would make sense to have a CRC or hash to
check the validity of the decoded data.
If the algorithm fails it will tend to avalanche a large number of failures,
so a slower CRC has no advantage over a faster hash.
This is not a systematic code meaning the original data is not visible
in the output.
It could serve dual purposes to be a form of simple encryption, by
setting the initial hash state to a secret.
Because the algorithm brute forces a small number of bits for each
step but leverages a hash spanning multiple steps,
the algorithm is much stronger than expected and can compete with more
complex software for recovery performance.
This trick reminds me of LDPC-Staircase, which has a simple code that
builds in strength as rows increase.
Because it uses hashes and randomizes its output, the code can be
adapted to different rates easily.
An advantage is that the code rate can be selected to be variable
between 5/4, 6/4, 7/4, 8/4 based on the channel.
Fractional rates between 4/4 and 5/4 can be achieved by puncturing.
Some puncturing can be used to reduce the rate below 25% overhead for
5/4 case, so for example every other
byte could be 4 instead of 5 bits. All it takes is to change what mask is used.
@catid
Copy link
Author

catid commented Jul 20, 2018

It can be turned into a systematic erasure code by mapping the low k bits of each input nibble to itself but mixing the bits for the extra bits added.

For example the original data might be passed straight through, and then FEC bits can be appended to the end of the message. This would be the preferred way to do it so decoding can be avoided if no errors are in the message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment