catid · July 20, 2018 00:22 · catid · Jul 20, 2018
diff --git a/gistfile1.txt b/gistfile1.txt
 Spinal Codes (totally metal name)
 Paper: http://nms.csail.mit.edu/papers/fp049-perry.pdf

 This is a bitwise error correction code that can correct bit errors anywhere in the message.
 It is not MDS and the strength of the code includes the processing time allowed for decoding.
 More discussion at the end.


 Simple version of the algorithm:

 Divide a message of n bits into chunks of k bits:

 n=32
 k=4

 Hash maps k=4 bits to c=7 bits (for example)

 Try all 2^k options for inputs (16 trials) and check which ones are
 closest to the received 7 bits.

 Remember the best few options, B.

 Note that the hash contains bits from the prior bits, so if a failure
 in decoding occurs it corrupts all remaining bits, but also it allows
 later words to provide info about earlier words.
 Perhaps the final word has a larger c to reduce overall error rate?  Anyway:

 The smallest error sum across several iterations (D) in a row indicates success.

 Trade-off between processing and error decoding ability:
 We can increase B to increase the fan-out of the graph.
 We can increase D to increase the depth of the graph.


 Potential software version of the algorithm:

 SIMD register is 128 bits wide (16 bytes).
 Initialize the register to 0, 1, 2, ..., 14, 15.
 We could use SIMD operations to parallelize k=4 by generating c=8
 values for comparison and truncating c down to the number of bits
 actually sent.
 The hash function implemented in this way needs to mix two things: (1)
 A large state buffer, maybe another 128-bit register.  (2) Each of the
 16 possible values.  The larger state buffer mixes in the prior values
 in the data.  For example it could be a 128-bit hash of the message so
 far.

 The result of the hash can be masked down to the number of bits
 desired for comparison.
 Broadcast to a comparison register the actual value byte A.
 For example if we received a 7-bit value 1f, then the comparison
 register gets 1f, 1f, ..., 1f, 1f.

 Then we would need to do Hamming distance comparison for each byte in parallel.
 PSHUFB can do this in a few operations as in:
 https://github.com/CMU-SAFARI/Shifted-Hamming-Distance/blob/master/ssse3_popcount.c

 We would want to repeat this operation for B of the best last options
 for the prior 4 bits.
 SIMD sorting network:
 https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf

 This sorting network appears to be the largest cost of guess-and-check.

 We take these best say B=4 values and run the above procedure for each
 of them on the next 4 bits.
 The sum of the Hamming distance of prior bits and next bits together
 is the overall score.
 The smallest overall score path is selected as the one true solution D
 steps behind the current step (crystallizes).


 Discussion:

 This algorithm runs in linear time so asymptotically faster than Reed
 Solomon forward error correction codes,
 but in trade it is not MDS.  It's possible for packet-sized data this
 is actually competitive for speed for this reason.
 The most exciting thing about this is that it might enable fast
 software FEC codes.

 It requires an amount of additional trailing data encoded at a higher
 rate to "cap off" the solution.
 It would make sense to use e.g. a 16-bit code for the last 4 bits.

 Within the encoded data it would make sense to have a CRC or hash to
 check the validity of the decoded data.
 If the algorithm fails it will tend to avalanche a large number of failures,
 so a slower CRC has no advantage over a faster hash.

 This is not a systematic code meaning the original data is not visible
 in the output.
 It could serve dual purposes to be a form of simple encryption, by
 setting the initial hash state to a secret.

 Because the algorithm brute forces a small number of bits for each
 step but leverages a hash spanning multiple steps,
 the algorithm is much stronger than expected and can compete with more
 complex software for recovery performance.
 This trick reminds me of LDPC-Staircase, which has a simple code that
 builds in strength as rows increase.

 Because it uses hashes and randomizes its output, the code can be
 adapted to different rates easily.
 An advantage is that the code rate can be selected to be variable
 between 5/4, 6/4, 7/4, 8/4 based on the channel.

 Fractional rates between 4/4 and 5/4 can be achieved by puncturing.
 Some puncturing can be used to reduce the rate below 25% overhead for
 5/4 case, so for example every other
 byte could be 4 instead of 5 bits.  All it takes is to change what mask is used.
	Spinal Codes (totally metal name)
	Paper: http://nms.csail.mit.edu/papers/fp049-perry.pdf

	This is a bitwise error correction code that can correct bit errors anywhere in the message.
	It is not MDS and the strength of the code includes the processing time allowed for decoding.
	More discussion at the end.


	Simple version of the algorithm:

	Divide a message of n bits into chunks of k bits:

	n=32
	k=4

	Hash maps k=4 bits to c=7 bits (for example)

	Try all 2^k options for inputs (16 trials) and check which ones are
	closest to the received 7 bits.

	Remember the best few options, B.

	Note that the hash contains bits from the prior bits, so if a failure
	in decoding occurs it corrupts all remaining bits, but also it allows
	later words to provide info about earlier words.
	Perhaps the final word has a larger c to reduce overall error rate? Anyway:

	The smallest error sum across several iterations (D) in a row indicates success.

	Trade-off between processing and error decoding ability:
	We can increase B to increase the fan-out of the graph.
	We can increase D to increase the depth of the graph.


	Potential software version of the algorithm:

	SIMD register is 128 bits wide (16 bytes).
	Initialize the register to 0, 1, 2, ..., 14, 15.
	We could use SIMD operations to parallelize k=4 by generating c=8
	values for comparison and truncating c down to the number of bits
	actually sent.
	The hash function implemented in this way needs to mix two things: (1)
	A large state buffer, maybe another 128-bit register. (2) Each of the
	16 possible values. The larger state buffer mixes in the prior values
	in the data. For example it could be a 128-bit hash of the message so
	far.

	The result of the hash can be masked down to the number of bits
	desired for comparison.
	Broadcast to a comparison register the actual value byte A.
	For example if we received a 7-bit value 1f, then the comparison
	register gets 1f, 1f, ..., 1f, 1f.

	Then we would need to do Hamming distance comparison for each byte in parallel.
	PSHUFB can do this in a few operations as in:
	https://github.com/CMU-SAFARI/Shifted-Hamming-Distance/blob/master/ssse3_popcount.c

	We would want to repeat this operation for B of the best last options
	for the prior 4 bits.
	SIMD sorting network:
	https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf

	This sorting network appears to be the largest cost of guess-and-check.

	We take these best say B=4 values and run the above procedure for each
	of them on the next 4 bits.
	The sum of the Hamming distance of prior bits and next bits together
	is the overall score.
	The smallest overall score path is selected as the one true solution D
	steps behind the current step (crystallizes).


	Discussion:

	This algorithm runs in linear time so asymptotically faster than Reed
	Solomon forward error correction codes,
	but in trade it is not MDS. It's possible for packet-sized data this
	is actually competitive for speed for this reason.
	The most exciting thing about this is that it might enable fast
	software FEC codes.

	It requires an amount of additional trailing data encoded at a higher
	rate to "cap off" the solution.
	It would make sense to use e.g. a 16-bit code for the last 4 bits.

	Within the encoded data it would make sense to have a CRC or hash to
	check the validity of the decoded data.
	If the algorithm fails it will tend to avalanche a large number of failures,
	so a slower CRC has no advantage over a faster hash.

	This is not a systematic code meaning the original data is not visible
	in the output.
	It could serve dual purposes to be a form of simple encryption, by
	setting the initial hash state to a secret.

	Because the algorithm brute forces a small number of bits for each
	step but leverages a hash spanning multiple steps,
	the algorithm is much stronger than expected and can compete with more
	complex software for recovery performance.
	This trick reminds me of LDPC-Staircase, which has a simple code that
	builds in strength as rows increase.

	Because it uses hashes and randomizes its output, the code can be
	adapted to different rates easily.
	An advantage is that the code rate can be selected to be variable
	between 5/4, 6/4, 7/4, 8/4 based on the channel.

	Fractional rates between 4/4 and 5/4 can be achieved by puncturing.
	Some puncturing can be used to reduce the rate below 25% overhead for
	5/4 case, so for example every other
	byte could be 4 instead of 5 bits. All it takes is to change what mask is used.