Last active
April 25, 2016 20:30
-
-
Save Juerd/ae574b87d40a66649692 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RFC: A more Perl6-esque "unpack" | |
================================ | |
This is an idea for an "unpack" replacement. The basic reasoning behind it, is | |
that number encodings and string encodings needn't be treated all that | |
differently. Instead of passing the name of a string encoding, you can pass | |
a native type object. When decoding things of determinable lengths, any number | |
of types can be given. | |
A variable length thing without a length indication can only be passed at the | |
end. | |
Decode according to a template: | |
$blob.decode( [ ... ] ) | |
Decode a string: | |
my $s = $blob.decode("utf8") | |
# actually short for: $blob.decode([ ::Inf => "utf8" ]) | |
Decode a natively encoded numeric value: | |
my $i = $blob.decode(uint16); | |
Decode a natively encoded numeric value, and a string: | |
my ($n, $s) = $blob.decode([ num, "latin1" ]); | |
This doesn't work: | |
my ($s, $i) = $blob.decode([ "latin1", uint16 ]); # FAILS | |
# Can't determine string length! | |
Force endianness for a single value: | |
my $i = $blob.decode([ :big(uint32) ]); | |
Set default endianness for the rest of the template: | |
my @i = $blob.decode([ :big, uint32, uint16, uint8 ]); | |
Decode two byte-length-prefixed blobs: | |
my ($blob1, $blob2) = $blob.decode([ ::uint32 => Blob, ::uint32 => Blob ]); | |
or: | |
my ($blob1, $blob2) = $blob.decode([ (::uint32 => Blob) xx 2 ]); | |
Decode any number of byte-length-prefixed blobs: | |
my @blobs = $blob.decode([ ::Inf => [ ::uint32 => Blob ] ]); | |
Decode any number of byte-length-prefixed strings: | |
my @strings = $blob.decode([ ::Inf => [ ::uint32 => "Windows-1252" ] ]); | |
A list of equityped things, with a counter prefix (as opposed to byte length): | |
my @i = $blob.decode([ :elems(uint8) => uint32 ]); | |
A sub-template with a typed byte length prefix: | |
[ ::uint32 => [ int32, uint16, "latin1" ] ] | |
A list of equityped things, with a BYTE length prefix: | |
[ ::uint32 => uint32 ] | |
Skipping a byte with Nil (when packing (encoding), Nil becomes \0): | |
[ int, int, int, Nil, int, int ] | |
User-defined number encoding in the mix: | |
my ($command, $param) = $blob.decode([ :big, uint8, MQTT::Length => Blob ]); | |
if $command == 0x30 { | |
my ($topic, $message) = $param.decode([:big, | |
::uint16 => "utf8", | |
Blob | |
]); | |
} | |
Note that: | |
* The KEY of a pair is part of the template, but NOT of the actual data returned | |
by decode. This holds true for length prefixes (key is a type object) and for | |
hints like :big and :little (key is a string). | |
* Pairs can nest like this : | |
:big(uint16) => Blob | |
:elems(:big(uint16)) => uint64 | |
* The compiler will eat pairs, thinking they're named arguments. This is why | |
templates are arrays. | |
Things that P5's unpack does, that this proposal does not cover: | |
* Hexadecimal, binary, or uuencoded strings. These are actually string | |
encodings, and should be implemented as such. (p5 <b B h H u U>) | |
* Absolute position based extraction ('@' and '.' in p5's pack). Don't know if | |
this is actually ever used, or how it even works. | |
* Pointers to strings. | |
* Null-terminated strings. Just have a Nil in there. | |
Juerd <[email protected]> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It took me a while, but I understand this and it makes some degree of sense. Especially since, as designed, it will fit into the already existing implementation.
Here's what took me some time to understand. Using the above example:
How do you suggest handling things like headers, though? In situations where the string length is known, it seems remiss to not include them in this design. Here's a suggestion following what you have already thought up. We just extend the byte-prefix notation to include a static length:
Is there a particular reason why you think something like this is unnecessary?