Skip to content

Instantly share code, notes, and snippets.

@x-yuri
Last active December 2, 2024 04:47
Show Gist options
  • Save x-yuri/50c4a9ce309ac41fe3e81798b0076745 to your computer and use it in GitHub Desktop.
Save x-yuri/50c4a9ce309ac41fe3e81798b0076745 to your computer and use it in GitHub Desktop.
perl: pack()

perl: pack()

I'm going to ignore the j, J, f, F, d, D, p, P, u, w formats.

My machine has the x86_64 architecture (little-endian) and I'm running perl-5.38-2.

The way it's described is not necessarily matches the way it works internally. This is merely an explanation that seems to work.

$ perl -V:shortsize -V:intsize -V:longsize -V:longlongsize
shortsize='2';
intsize='4';
longsize='8';
longlongsize='8';

And a couple of words on storing strings in Perl. Basically there are 2 types of strings in Perl: binary strings (the UTF8 flag is unset) and UTF-8 strings (the UTF8 flag is set). Let me introduce a notation to describe strings here: u if the UTF8 flag is set, the character length in square brackets, optionally followed by a colon, and the bytes separated by dots. E.g. [1]:ff is a binary string, one character long, which is \xff. u[1]:c2.80 is a UTF-8 string, one character long, with bytes 0xc2 and 0x80 (U+0080). The following function converts a string to such notation:

sub {
    my $v = shift;
    my $u = utf8::is_utf8($v) ? 'u' : '';
    my $l = length $v;
    use bytes;
    sprintf("%s[%u]", $u, $l)
    . (length $v ? sprintf ":%v02x", $v : '');
}

Binary strings are sequences of bytes. And although they can contain a UTF-8 encoded representation of a string ("\xc2\x80", which is [2]:c2.80, which bytes represent U+0080), that is not necessarily so ("\xff", which is [1]:ff, which bytes are invalid UTF-8), each character is < 0x100 and corresponds to one byte.

UTF-8 strings are sequences of Unicode code points (or characters that correspond to Unicode code points). And although they can contain arbitrary data (use Encode qw(_utf8_on); my $s = "\x80"; _utf8_on($s), which is u[1]:80, which bytes are invalid UTF-8), that is not necessarily so ("\x{100}", which is u[1]:c4.80, which bytes represent U+0100), characters can be > 0xff and correspond to more than 1 byte.

Functions encode() and decode() transform UTF-8 strings into binary ones (encode 'UTF-8', "\x{100}", which is [2]:c4.80, which is "\xc4\x80"), and back again (decode 'UTF-8', "\xc4\x80", which is u[1]:c4.80, which is "\x{100}").

Some characters can be stored either as a binary ("\x80", which is [1]:80) or as a UTF-8 string ("\N{U+0080}", which is u[1]:c2.80). "\x80" in this case can be upgraded to its UTF-8 counterpart (my $s = "\x80"; utf8::upgrade($s);, which is u[1]:c2.80, which is "\N{U+0080}"), and downgraded back again (my $s = "\N{U+0080}"; utf8::downgrade($s), which is [1]:80, which is "\x80").

As long as Perl can store a string as a binary one, it will do so ("\xff", which is [1]:ff). But code points > 0xff can't be stored that way ("\x{100}", which is u[1]:c4.80, which represents U+0100).

Also if Perl concatenates a binary and a UTF-8 string (or vice versa), the binary string is upgraded first ("\x80" . "\x{100}" eq "\N{U+0080}\x{100}", which is u[2]:c2.80.c4.80, which bytes represent U+0080, U+0100).

If you want to interact with C code or network services, you generally want binary strings. Although you can store binary data in UTF-8 strings... At least generally it's a bad idea.

With that out of the way...

tl;dr

  • a a character of a string (null-padding) (pack 'a2', 'a' -> "a\x00")
  • A a character of a string (space-padding) (pack 'A2' -> 'a ')
  • Z a character of a string (null-padding, null-termination) (pack 'Z3', 'a' -> "a\x00\x00")
  • b a binary digit of a string (LSB first) (pack 'b', '1' -> "\x01")
  • B a binary digit of a string (MSB first) (pack 'B', '1' -> "\x80")
  • h a hex digits of a string (low nybble first) (pack 'h', '1' -> "\x01")
  • H a hex digits of a string (high nybble first) (pack 'H', '1' -> "\x10")
  • c a signed char (8-bit) (pack 'c', -1 -> "\xff")
  • C an unsigned char (8-bit) (pack 'C', 1 -> "\x01")
  • W a Unicode code point (pack 'W', 0x100 -> "\x{100}")
  • s a signed short (16-bit) (pack 's', -1 -> "\xff\xff")
  • S an unsigned short (16-bit) (pack 'S', 1 -> "\x01\x00" in the case of little-endian byte order)
  • l a signed long (32-bit) (pack 'l', -1 -> "\xff\xff\xff\xff")
  • L an unsigned long (32-bit) (pack 'L', 1 -> "\x01\x00\x00\x00" in the case of little-endian byte order)
  • q a signed quad (64-bit) (pack 'q', -1 -> "\xff\xff\xff\xff" . "\xff\xff\xff\xff")
  • Q an unsigned quad (64-bit) (pack 'Q', 1 -> "\x01\x00\x00\x00" . "\x00\x00\x00\x00" in the case of little-endian byte order)
  • i a signed int (native) (pack 'i', -1 -> "\xff\xff\xff\xff" in the case of intsize == 4)
  • I an unsigned int (native) (pack 'I', 1 -> "\x01\x00\x00\x00" in the case of little-endian byte order and intsize == 4)
  • n an unsigned short (16-bit, big-endian) (pack 'n', 1 -> "\x00\x01")
  • N an unsigned long (32-bit, big-endian) (pack 'N', 1 -> "\x00\x00\x00\x01")
  • v an unsigned short (16-bit, little-endian) (pack 'v', 1 -> "\x01\x00")
  • V an unsigned long (32-bit, little-endian) (pack 'V', 1 -> "\x01\x00\x00\x00")
  • U UTF-8 encoded representation of a Unicode code point (pack 'U', 0x80 -> "\N{U+0080}")
  • x packs a null, or skips characters when unpacking (unpack 'xa', 'ab' -> 'b')
  • X truncates back, or steps back when unpacking (pack 'aX', 'a' -> '')
  • @ truncates/null-fills to an absolute position given by a repeat count, or moves to an absolute position when unpacking (pack '@1' -> "\x00")
  • . truncates/null-fills to an absolute position given by an argument, or unpacks nulls (pack '.', 1 -> "\x00")

pack TEMPLATE, LIST takes LIST and packs it into a string according to TEMPLATE, which is a sequence of formats, e.g. pack 'ac', 'a', 1 returns "a\x01". There are two formats here: a and c. Each format tells pack() what to do with the next argument: a to take one character from the argument ('a'), and put it into the resulting string, c to take a 8-bit signed integer i (1), and add chr(i) to the string.

Formats differ in the values they take:

  • a, A, Z, b, B, h, H (string formats) take characters from a string (pack 'a', 'a')
  • c, C, W, s, S, l, L, q, Q, i, I, n, N, v, V, U (integer formats) take integers (pack 'c', 1)
  • x, X, @ take nothing
  • . takes an integer (pack '.', 1), but it's rather similar to the previous group in what it does

Formats can be followed by a repeat count, which tells pack() how many values a format takes:

  • string formats take values from one argument (pack 'a2', 'ab' returns 'ab'), but each next format takes values from the next argument (pack 'aa', 'a', 'b' returns 'ab')
  • for integer formats each argument is a value (pack 'c2', 1, 2 equals pack 'cc', 1, 2 equals "\x01\x02")

Some even say, that repeat count is length (string length) in the case of string formats.

unpack TEMPLATE, EXPR is a reverse operation (unpack 'ac', "a\x01" returns 'a', 1), although unpack $template, pack $template, @list doesn't always equal @list, e.g. unpack 'a2', pack 'a2', 'a' returns "a\x00", because a packs nulls if it runs out of characters (pack 'a2', 'a' returns "a\x00").

Then, a, A, Z pack/unpack strings. They take a character from the argument and copy it into the resulting string:

  • a, Z pad values (if there are not enough characters) with nulls (pack 'a2', 'a' -> "a\x00"), A with spaces (pack 'A2', 'a' -> 'a ')
  • Z is like a, but takes one character less and adds \x00 (pack 'Z2', 'a' -> "a\x00")
  • A strips trailing spaces and nulls when unpacking (unpack 'A3', "a\x00 " -> 'a'), Z strips the first null and what follows (unpack 'Z3', "a\x00b" -> 'a'), a strips nothing
  • all three unpack separate formats into separate values (unpack 'aa', 'ab' -> 'a', 'b')

b, B, h, H pack/unpack digits. They take a digit from the argument and put it into a byte of the resulting string:

  • in the case of b, B a digit is 0..1, for h, H it's 0..f
  • b, B fill the resulting string with bits, h, H with nybbles
  • b, h start with the LSB/low nybble (pack 'b', '1' -> "\x01"), B, H with the MSB/high nybble (pack 'B', '1' -> "\x80")
  • in the case of b, B every 8 digits produce a character (pack 'b9', '1111' . '1111' . '1' -> "\xff\x01")
  • in the case of h, H every 2 digits produce a character (pack 'h3', '123' -> "\x21\x03")
  • all four unpack separate formats into separate values (unpack 'bb', "\x01\x01 -> '1', '1')

c, C, s, S, l, L, q, Q, i, I, n, N, v, V pack/unpack integers. They take an integer, split it into bytes bi, and put chr(bi) into the resulting string:

  • c takes an integer in the range -128..127, C in the range 0..255, each argument produces one character in the resulting string (pack 'c', 1 -> "\x01")
  • s takes an integer in the range -0x8000..0x7fff, S in the range 0..0xffff, each argument produces 2 characters in the system's native byte order (pack 's', 1 produces "\x01\x00" in the case of a little-endian system)
  • l takes an integer in the range -0x8000_0000..0x7fff_ffff, L in the range 0..0xffff_ffff, each argument produces 4 characters in the system's native byte order (pack 'l', 1 produces "\x01\x00\x00\x00" in the case of a little-endian system)
  • q takes an integer in the range -0x8000_0000_0000_0000..0x7fff_ffff_ffff_ffff, Q in the range 0..0xffff_ffff_ffff_ffff, each argument produces 8 characters in the system's native byte order (pack 'q', 1 produces "\x01\x00\x00\x00" . "\x00\x00\x00\x00" in the case of a little-endian system)
  • for i and I the range and the number of produced characters is system-dependent, in my case they're -0x8000_0000..0x7fff_ffff/4 (l) and 0..0xffff_ffff/4 (L) respectively
  • n takes an integer in the range 0..0xffff and produces 2 characters in the big-endian byte order (pack 'n', 1 -> "\x00\x01")
  • N takes an integer in the range 0..0xffff_ffff and produces 4 characters in the big-endian byte order (pack 'N', 1 -> "\x00\x00\x00\x01")
  • v takes an integer in the range 0..0xffff and produces 2 characters in the little-endian byte order (see S in the case of a little-endian system)
  • V takes an integer in the range 0..0xffff_ffff and produces 4 characters in the little-endian byte order (see L in the case of a little-endian system)

So far I assumed that what is fed to pack()/unpack() is binary strings, and resulting string is binary as well. At this point I can no longer ignore the issue.

W packs/unpacks code points:

  • it takes an integer in the range 0..0x10ffff, each argument is converted to a character (chr()) and added to the resulting string (pack 'W', 1 -> "\x01", which is [1]:01)
  • when the value is > 0xff, it produces a UTF-8 string (pack 'W', 0x100 -> "\x{100}", which is u[1]:c4.80)

By default pack() operates in C0 (character) mode. In this mode the resulting string is considered a sequence of characters:

  • a, A, Z produce characters and they're packed as characters:

    • binary characters are added as binary characters (pack 'a', "\xff" -> "\xff", which is [1]:ff)
    • UTF-8 characters are added as UTF-8 characters (pack 'a' "\x{100}" -> "\x{100}", which is u[1]:c4.80)
  • b, B, h, H produce bytes, they're converted to characters (chr()) and added to the string (pack 'b', '1' -> "\x01", which is [1]:01).

  • c, C, s, S, l, L, q, Q, i, I, n, N, v, V, U produce integers:

    • when the range fits one byte (c, C), it's like in the previous case (pack 'c', 1 -> "\x01", which is [1]:01)
    • for bigger integers each byte becomes a character and is added to the string (pack 's', 1 -> "\x01\x00", which is [2]:01.00)
  • W produces code points, they are converted to characters (chr()), and added to the string:

    • when the code point is < 0x100, a binary character is produced (pack 'W', 0xff -> "\xff", which is [1]:ff)
    • otherwise , a UTF-8 character is produced (pack 'W', 0x100 -> "\x{100}", which is u[1]:c4.80)

In other words what is added to the resulting string is characters. Formats like W, b, c produce integers, but they're converted to characters with chr(). Formats like s and bigger consider each byte separately.

Adding UTF-8 characters to the string affects other values:

  • If the resulting string is binary, and a UTF-8 character is added, the resulting string is upgraded first. Let's consider pack 'aa', "\xff", "\x{100}". First "\xff" is added to the result, which becomes [1]:ff. Then the resulting string is upgraded, because "\x{100}" is a UTF-8 character, and becomes u[1]:c3.bf (U+00FF). Then "\x{100}" is added, producing u[2]:c3.bf.c4.80 (U+00FF, U+0100).

  • If the resulting string is UTF-8, and a binary character is added, it's first upgraded. Let's consider pack 'aa', "\x{100}", "\xff". First "\x{100}" is added to the result, which becomes u[1]:c4.80 (U+0100). Then "\xff" is upgraded and becomes u[1]:c3.bf (U+00FF). Then the upgraded character is added to the string, producing u[2]:c4.80.c3.bf (U+0100, U+00FF).

In other words:

  • pack 'Ca', 0x80, "\N{U+0080}" produces u[2]:c2.80.c2.80, that is initially C produces "\x80", but then it's upgraded to "\N{U+0080}" (u[1]:c2.80).

  • pack 'aC', "\N{U+0080}", 0x80 produces u[2]:c2.80.c2.80, that is C might have produced "\x80", but since the resulting string is UTF-8, it produces "\N{U+0080}" (u[1]:c2.80).

what if UTF-8 character is indeed a binary character with the UTF8 flag set?

The mode can be switched to U0 (UTF-8 byte) mode with U0. In this mode the resulting string is considered a sequence of bytes (a UTF-8 encoded representation of a string), and invalid UTF-8 is not accepted. E.g. pack('U0C', 0x80) produces a warning Malformed UTF-8 character: \x80 and dies. Also switching to U0 (at least once) forces the result to be a UTF-8 string (pack 'U0' -> u[0]):

  • a, A, Z produce characters, but what is added to the string is their code points, wrapped at 0xff:
    • pack 'U0a', "\x01" packs 0x01 into the string, producing u[1]:01, pack 'a', "\x01" would produce [1]:01
    • pack 'U0a', "\N{U+00C0}" ("\xc3\x80" in UTF-8) produces a warning Malformed UTF-8 character: \xc0 and dies, because the a format has packed 0xc0 (not 0xc3, 0x80) into the string, pack 'a', "\N{U+00C0}" would produce u[1]:c3.80
    • pack 'U0a', "\x{100}" (0xc4, 0x80 in UTF-8) produces a warning Character(s) in 'a' format wrapped, and packs 0x00 into the string, producing u[1]:00, pack 'a', "\x{100}" would produce u[1]:c4.80
  • b, B, h, H produce bytes, and these bytes are added to the string:
    • pack 'U0b', '1' packs 0x01 into the string, producing u[1]:01, pack 'b', '1' would produce [1]:01
    • pack 'U0B', '1' produces a warning Malformed UTF-8 character: \x80 and dies, because the B format has packed 0x80 into the string, pack 'B', '1' would produce [1]:80
  • c, C, s, S, l, L, q, Q, i, I, n, N, v, V, U produce bytes (some formats produce more than 1 byte), and these bytes are added to the string:
    • pack 'U0c', 1 packs 0x01 into the string, producing u[1]:01, pack 'c', 1 would produce [1]:01
    • pack 'U0C', 0x80 produces a warning Malformed UTF-8 character: \x80 and dies, because the C format has packed 0x80 into the string, pack 'C', 0x80 would produce [1]:80
    • pack 'U0S', 1 packs 0x01 and 0x00 (on a little-endian system) into the string, producing u[2]:01.00, pack 'S', 1 would produce [2]:01.00
  • W produces a code point, the code point is wrapped at 0xff, and the resulting byte is added to the string:
    • pack 'U0W', 1 packs 0x01 into the string, producing u[1]:01, pack 'W', 1 would produce [1]:01
    • pack 'U0W', 0x80 produces a warning Malformed UTF-8 character: \x80 and dies, because the W format has packed 0x80 into the string, pack 'W', 0x80 would produce [1]:80
    • pack 'U0W', 0x100 produces a warning Character in 'W' format wrapped and packs 0x00 into the string, producing u[1]:00, pack 'W', 0x100 would produce u[1]:c4.80 (U+0100)

In other words what is added to the resulting string is bytes. Formats like s produce more than one byte. W produces code points, but they're wrapped at 0xff and become bytes. Formats like a produce characters, but what is added to the string is their code points wrapped at 0xff.

Do note that what matters is for the resulting byte sequence (after finishing processing the template) to be valid UTF-8:

  • pack 'U0C', 1 succeeds because the resulting byte sequence (0x01) is valid UTF-8
  • pack 'U0C', 0xc2 fails because the resulting byte sequence (0xc2) is invalid UTF-8
  • pack 'U0CC', 0xc2, 0x80 succeeds because the resulting byte sequence (0xc2, 0x80) is valid UTF-8

pack() produces a UTF-8 string if:

  • U0 was enabled at least once (pack 'U0', pack 'U', 0) TODO move below
  • W was passed a code point > 0xff (pack 'W', 0x100)
  • a, A, Z was passed a UTF-8 string (pack 'a', "\N{U+0080}", or my $s = ''; utf8::upgrade($s); pack 'a', $s)

The mode can be switched back to C0 with C0. E.g. pack 'U0C0a', "\x80" -> u[1]:c2.80 (U+0080), pack 'U0a', "\x80" would die. "\x80" is upgraded before adding to the string because U0 mode made the string UTF-8.

U packs/unpacks code points:

  • In U0 mode the U format produces the UTF-8 bytes of the code point, which are packed into the resulting string. E.g. pack 'U0U', 0x80 -> u[1]:c2.80 (U+0080).
  • In C0 mode the U format still produces the UTF-8 bytes, but they are packed as separate characters (pack 'C0U', 0x80 -> [2]:c2.80).

The mode is switched to U0 implicitly if TEMPLATE starts with the U format. E.g. pack 'Ua2', 0, "\xc2\x80" -> u[2]:00.c2.80 (U+0000, U+0080). pack 'a2', "\xc2\x80" would produce [2]:c2.80.

By default pack() operates in C0 (character) mode. In this mode values are added to the resulting string as characters:

  • pack('a', 'a') == 'a'
  • pack('A', 'a') == 'a'
  • pack('Z2', 'a') == "a\x00"
  • pack('b8', '11111111') == "\xff" (a value is added as soon as there's a byte)
  • pack('B8', '11111111') == "\xff"
  • pack('h2', '11') == "\x11"
  • pack('H2', '11') == "\x11"
  • pack('c', 1) == "\x01"
  • pack('C', 1) == "\x01"
  • pack('W', 1) == "\x01"
  • pack('s', 1) == "\x01\x00" (each byte is added as a separate character)
  • pack('S', 1) == "\x01\x00"
  • pack('l', 1) == "\x01\x00\x00\x00"
  • pack('L', 1) == "\x01\x00\x00\x00"
  • pack('q', 1) == "\x01\x00\x00\x00\x00\x00\x00\x00"
  • pack('Q', 1) == "\x01\x00\x00\x00\x00\x00\x00\x00"
  • pack('i', 1) == "\x01\x00\x00\x00" (i == l in my case)
  • pack('I', 1) == "\x01\x00\x00\x00" (I == L in my case)
  • pack('n', 1) == "\x00\x01"
  • pack('N', 1) == "\x00\x00\x00\x01"
  • pack('v', 1) == "\x01\x00" (v == S in the case of a little-endian system)
  • pack('V', 1) == "\x01\x00\x00\x00" (V == L in the case of a little-endian system)

In U0 (UTF-8 byte) mode, which is turned on with, well, U0 (pack('U0...', ...)), values are added to a sequence of bytes. Before returning from pack() the sequence of bytes is typecasted or becomes the resulting string (or so it looks). As such the resulting sequence of bytes should be valid UTF-8. Also in this mode the ranges of a, A, Z and W reduced to 0..0xff:

  • pack('U0a2', "\xdf\xbf") == "\x{7ff}" ("\xdf\xbf" is the UTF-8 encoded representation of "\x{7ff}")
  • pack('U0A2', "\xdf\xbf") == "\x{7ff}"
  • pack('U0Z3', "\xdf\xbf") == "\x{7ff}\x00"
  • pack('U0b16', '1111' . '1011' . '1111' . '1101') == "\x{7ff}" (what b takes in hex is 'fbfd')
  • pack('U0B16', '1101' . '1111' . '1011' . '1111') == "\x{7ff}" (waht B takes in hex is 'dfbf')
  • pack('U0h4', 'fdfb') == "\x{7ff}"
  • pack('U0H4', 'dfbf') == "\x{7ff}"
  • pack('U0c2', 0xdf - 0x100, 0xbf - 0x100) == "\x{7ff}" (the c range is -0x80..0x7f, so we need to adjust the values)
  • pack('U0C2', 0xdf, 0xbf) == "\x{7ff}"
  • pack('U0W2', 0xdf, 0xbf) == "\x{7ff}"
  • pack('U0s', 0xbfdf - 0x10000) == "\x{7ff}"
  • pack('U0S', 0xbfdf) == "\x{7ff}"
  • pack('U0l', 0xbfdf) == "\x{7ff}\x00\x00"
  • pack('U0L', 0xbfdf) == "\x{7ff}\x00\x00"
  • pack('U0q', 0xbfdf) == "\x{7ff}\x00\x00\x00\x00\x00\x00"
  • pack('U0Q', 0xbfdf) == "\x{7ff}\x00\x00\x00\x00\x00\x00"
  • pack('U0i', 0xbfdf) == "\x{7ff}\x00\x00" (i == l in my case)
  • pack('U0I', 0xbfdf) == "\x{7ff}\x00\x00" (I == L in my case)
  • pack('U0n', 0xdfbf) == "\x{7ff}"
  • pack('U0N', 0xdfbf) == "\x00\x00\x{7ff}"
  • pack('U0v', 0xbfdf) == "\x{7ff}" (v == S in the case of a little-endian system)
  • pack('U0V', 0xbfdf) == "\x{7ff}\x00\x00" (V == L in the case of a little-endian system)
  • pack('U0U', 0x7ff) == "\x{7ff}" (U adds to the sequence of bytes the UTF-8 encoded representation of its argument)

Do note that the sequence of bytes doesn't have to be valid UTF-8 at any intermediate step (pack('U0aXac', "\x80", "\xdf", 0xbf - 0x100) == "\x{7ff}", X erases the last byte).

In addition to turning on the U0 mode explicitly, it's turned on implicitly when TEMPLATE starts with U (pack('Ua2', 0x7ff, "\xdf\xbf") == "\x{7ff}\x{7ff}"). In C0 mode U produces UTF-8 encoded representation of its argument (pack('C0U', 0x7ff) == "\xdf\xbf", pack('aU', "\x80", 0x7ff) == "\x80\xdf\xbf"). You can always switch the mode midway explicitly (pack('...U0...C0...', ...)).

Or in other words, generally W and U take a code point and produce a character (pack('W', 1) == "\x01", pack('U', 1) == "\x01"). But in U0 mode W takes UTF-8 representation and produces a character (pack('U0W2', 0xdf, 0xbf) == "\x{7ff}"), and in C0 mode U takes a code point and produces UTF-8 representation (pack('C0U', 0x7ff) == "\xdf\xbf").

x produces a null (pack('x') == "\x00").

X takes a step back, removing the characters in the process (pack('aX', 'a') == '').

@ moves the current position in the resulting string, truncating or null-filling it in the process (pack('a@0', 'a') == '', pack('@1') == "\x00"). The repeat count is an absolute position counted from the beginning of the resulting string.

. is like @ that takes its argument not from a repeat count (pack('a.', 'a', 0) == '', pack('.', 1) == "\x00").

Formats might be grouped with parenthesis. In this case the @/.'s arguments are counted from the start of the innermost group (pack('a(a@0)', 'a', 'a') == 'a').

$ docker run --rm -itv "$PWD":/host alpine:3.20
/ # apk add perl perl-test2-suite perl-utils
/ # for f in host/.*.pl host/*.pl; prove "$f" || break; done

perlpacktut
pack
unpack
Pack/Unpack Tutorial

What follows is my experiments:

use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a character of a string
is pack('a', $_),
$_,
'pack: ' . px ord, '06'
for chr int rand 0x110000;
# pads with \x00
is pack('a2', 'a'), "a\x00", 'pack: padding';
# takes characters only from one argument
is pack('a2', 'a', 'b'), "a\x00", 'pack: extra arg';
is pack('a'), "\x00", 'pack: no arg';
is pack('a2'), "\x00\x00", 'pack: no arg: a2';
is pack('a0', ''), '', 'pack: a0';
is pack('a', 'a'), 'a', 'pack: a';
is pack('a1', 'a'), 'a', 'pack: a1';
is pack('a2', 'ab'), 'ab', 'pack: a2';
is pack('a*', 'abc'), 'abc', 'pack: a*';
is pack('a3', 'a'), "a\x00\x00", 'pack: padding: a3';
is pack('a', 'ab'), 'a', 'pack: extra char';
# unpacks a character
is [unpack 'a'],
[$_],
'unpack: ' . px ord, '06'
for chr int rand 0x110000;
# unpads nothing
is [unpack 'a2a', "a\x00b"], ["a\x00", 'b'], 'unpack: unpad';
# unpacks separate characters into separate values
is [unpack 'aa', 'ab'], ['a', 'b'], 'unpack: aa';
is [unpack 'a0', ''], [''], 'unpack: a0';
is [unpack 'a', 'a'], ['a'], 'unpack: a';
is [unpack 'a1', 'a'], ['a'], 'unpack: a1';
is [unpack 'a2', 'ab'], ['ab'], 'unpack: a2';
is [unpack 'a*', 'abc'], ['abc'], 'unpack: a*';
is [unpack 'a3a', "a\x00bc"], ["a\x00b", 'c'], 'unpack: unpad: a3';
is [unpack 'a', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a character of a string
is pack('A', $_),
$_,
'pack: ' . px ord, '06'
for chr int rand 0x110000;
# pads with spaces
is pack('A2', 'a'), 'a ', 'pack: padding';
# takes characters only from one argument
is pack('A2', 'a', 'b'), 'a ', 'pack: extra arg';
is pack('A'), ' ', 'pack: no arg';
is pack('A2'), ' ', 'pack: no arg: A2';
is pack('A0', ''), '', 'pack: A0';
is pack('A', 'a'), 'a', 'pack: A';
is pack('A1', 'a'), 'a', 'pack: A1';
is pack('A2', 'ab'), 'ab', 'pack: A2';
is pack('A*', 'abc'), 'abc', 'pack: A*';
is pack('A3', 'a'), 'a ', 'pack: padding: A3';
is pack('A', 'ab'), 'a', 'pack: extra char';
# unpacks a character
is [unpack 'A'],
[$_],
'unpack: ' . px ord, '06'
for chr int rand 0x110000;
# unpads trailing spaces and \x00's
is [unpack 'A2A', 'a b'], ['a', 'b'], 'unpack: unpad: space';
is [unpack 'A2A', "a\x00b"], ['a', 'b'], 'unpack: unpad: \x00';
is [unpack 'A3', 'a b'], ['a b'], 'unpack: unpad: non-trailing space';
is [unpack 'A3', "a\x00b"], ["a\x00b"], 'unpack: unpad: non-trailing \x00';
# unpacks separate characters into separate values
is [unpack 'AA', 'ab'], ['a', 'b'], 'unpack: AA';
is [unpack 'A0', ''], [''], 'unpack: A0';
is [unpack 'A', 'a'], ['a'], 'unpack: A';
is [unpack 'A1', 'a'], ['a'], 'unpack: A1';
is [unpack 'A2', 'ab'], ['ab'], 'unpack: A2';
is [unpack 'A*', 'abc'], ['abc'], 'unpack: A*';
is [unpack 'A3A', 'a b'], ['a', 'b'], 'unpack: unpad: 2 x space';
is [unpack 'A3A', "a\x00\x00b"], ['a', 'b'], 'unpack: unpad: 2 x \x00';
is [unpack 'A', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a character of a string and adds \x00
is pack('Z2', $_),
$_ . "\x00",
'pack: ' . px ord, '06'
for chr int rand 0x110000;
# pads with \x00
is pack('Z3', 'a'), "a\x00\x00", 'pack: padding';
# takes characters only from one argument
is pack('Z3', 'a', 'b'), "a\x00\x00", 'pack: extra arg';
is pack('Z'), "\x00", 'pack: no arg';
is pack('Z2'), "\x00\x00", 'pack: no arg: Z2';
is pack('Z', ''), "\x00", 'pack: empty arg';
is pack('Z2', ''), "\x00\x00", 'pack: empty arg: Z2';
is pack('Z0', ''), '', 'pack: Z0';
is pack('Z', ''), "\x00", 'pack: Z';
is pack('Z1', ''), "\x00", 'pack: Z1';
is pack('Z2', 'a'), "a\x00", 'pack: Z2';
is pack('Z*', 'ab'), "ab\x00", 'pack: Z*';
is pack('Z4', 'a'), "a\x00\x00\x00", 'pack: padding: Z4';
is pack('Z2', 'ab'), "a\x00", 'pack: extra char';
# unpacks a character
is [unpack 'Z2', $_ . "\x00"],
[$_],
'unpack: ' . px ord, '06'
for chr int rand 0x110000;
# unpad \x00 and what follows
is [unpack 'Z2a', "a\x00b"], ['a', 'b'], 'unpack: unpad';
is [unpack 'Z3a', "a\x00bc"], ['a', 'c'], 'unpack: unpad: non-trailing \x00';
# unpacks separate characters into separate values
is [unpack 'ZZ', 'ab'], ['a', 'b'], 'unpack: ZZ';
is [unpack 'Z0', ''], [''], 'unpack: Z0';
is [unpack 'Z', 'a'], ['a'], 'unpack: Z';
is [unpack 'Z1', 'a'], ['a'], 'unpack: Z1';
is [unpack 'Z2', 'ab'], ['ab'], 'unpack: Z2';
is [unpack 'Z*', 'abc'], ['abc'], 'unpack: Z*';
is [unpack 'Z', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a digit into LSB
is pack('b', $_),
chr hex,
'pack: ' . $_
for '0', '1';
# packs 8 digits into a character
is pack('b*', '1111' . '1111' . '1'), "\xff\x01", 'pack: 8 digits per character';
# takes digits only from one argument
is pack('b2', '1', '1'), "\x01", 'pack: extra arg';
is pack('b'), "\x00", 'pack: no arg';
is pack('b2'), "\x00", 'pack: no arg: b2';
is pack('b', ''), "\x00", 'pack: empty arg';
is pack('b2', ''), "\x00", 'pack: empty arg: b2';
# takes chr(x) as 1 if x & 1 else as 0
is pack('b', $_),
chr(ord $_ & 1),
'pack: ' . px ord, '06'
for chr int rand 0x110000;
is pack('b0', ''), '', 'pack: b0';
is pack('b', '1'), "\x01", 'pack: b';
is pack('b1', '1'), "\x01", 'pack: b1';
is pack('b2', '11'), "\x03", 'pack: b2';
is pack('b*', '111'), "\x07", 'pack: b*';
is pack('b', '11'), "\x01", 'pack: extra digit';
# unpacks a digit from LSB
is [unpack 'b'],
[sprintf '%b', ord],
'unpack: ' . px ord, '02'
for "\x00", "\x01";
# unpacks 8 digits from a character
is [unpack 'b*', "\xff\x01"],
['1111' . '1111' . '1000' . '0000'],
'unpack: 8 digits per character';
# unpacks separate characters into separate values
is [unpack 'bb', "\x01\x01"], ['1', '1'], 'unpack: bb';
is [unpack 'b0', ''], [''], 'unpack: b0';
is [unpack 'b', "\x01"], ['1'], 'unpack: b';
is [unpack 'b1', "\x01"], ['1'], 'unpack: b1';
is [unpack 'b2', "\x03"], ['11'], 'unpack: b2';
is [unpack 'b*', "\x07"], ['1110' . '0000'], 'unpack: b*';
is [unpack 'b', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a digit into MSB
is pack('B', $_),
hex ? "\x80" : "\x00",
'pack: ' . $_
for '0', '1';
# packs 8 digits into a character
is pack('B*', '1111' . '1111' . '1'), "\xff\x80", 'pack: 8 digits per character';
# takes digits only from one argument
is pack('B2', '1', '1'), "\x80", 'pack: extra arg';
is pack('B'), "\x00", 'pack: no arg';
is pack('B2'), "\x00", 'pack: no arg: B2';
is pack('B', ''), "\x00", 'pack: empty arg';
is pack('B2', ''), "\x00", 'pack: empty arg: B2';
# takes chr(x) as 1 if x & 1 else as 0
is pack('B', $_),
ord $_ & 1 ? "\x80" : "\x00",
'pack: ' . px ord, '06'
for chr int rand 0x110000;
is pack('B0', ''), '', 'pack: B0';
is pack('B', '1'), "\x80", 'pack: B';
is pack('B1', '1'), "\x80", 'pack: B1';
is pack('B2', '11'), "\xc0", 'pack: B2';
is pack('B*', '111'), "\xe0", 'pack: B*';
is pack('B', '11'), "\x80", 'pack: extra digit';
# unpacks a digit from MSB
is [unpack 'B'],
[ord ? '1' : '0'],
'unpack: ' . px ord, '02'
for "\x00", "\x80";
# unpacks 8 digits from a character
is [unpack 'B*', "\xff\x80"],
['1111' . '1111' . '1000' . '0000'],
'unpack: 8 digits per character';
# unpacks separate characters into separate values
is [unpack 'BB', "\x80\x80"], ['1', '1'], 'unpack: BB';
is [unpack 'B0', ''], [''], 'unpack: B0';
is [unpack 'B', "\x80"], ['1'], 'unpack: B';
is [unpack 'B1', "\x80"], ['1'], 'unpack: B1';
is [unpack 'B2', "\xc0"], ['11'], 'unpack: B2';
is [unpack 'B*', "\xe0"], ['1110' . '0000'], 'unpack: B*';
is [unpack 'B', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a digit into the low nybble
is pack('h', $_),
chr hex,
'pack: 0x' . $_
for '0'..'9', 'a'..'f';
# packs 2 digits into a character
is pack('h*', '123'), "\x21\x03", 'pack: 2 digits per character';
# takes digits only from one argument
is pack('h2', '1', '2'), "\x01", 'pack: extra arg';
is pack('h'), "\x00", 'pack: no arg';
is pack('h2'), "\x00", 'pack: no arg: h2';
is pack('h', ''), "\x00", 'pack: empty arg';
is pack('h2', ''), "\x00", 'pack: empty arg: h2';
# takes chr(x) as x % 16, but there are 2 special cases
is pack('h', $_),
chr(
ord 'a' <= ord $_ <= ord 'z' ? (ord($_) - ord('a') + 10) % 16
: ord 'A' <= ord $_ <= ord 'Z' ? (ord($_) - ord('A') + 10) % 16
: ord($_) % 16),
'pack: '. px ord, '06'
for chr int rand 0x110000;
is pack('h0', ''), '', 'pack: h0';
is pack('h', '1'), "\x01", 'pack: h';
is pack('h1', '1'), "\x01", 'pack: h1';
is pack('h2', '12'), "\x21", 'pack: h2';
is pack('h*', '123'), "\x21\x03", 'pack: h*';
is pack('h', '12'), "\x01", 'pack: extra digit';
# unpacks a digit from the low nybble
is [unpack 'h'],
[sprintf '%x', ord],
'unpack: ' . px ord, '02'
for map chr, 0..15;
# unpacks 2 digits from a character
is [unpack 'h*', "\x21\x03"],
['1230'],
'unpack: 2 digits per character';
# unpacks separate characters into separate values
is [unpack 'hh', "\x01\x02"], ['1', '2'], 'unpack: hh';
is [unpack 'h0', ''], [''], 'unpack: h0';
is [unpack 'h', "\x01"], ['1'], 'unpack: h';
is [unpack 'h1', "\x01"], ['1'], 'unpack: h1';
is [unpack 'h2', "\x21"], ['12'], 'unpack: h2';
is [unpack 'h*', "\x21\x03"], ['1230'], 'unpack: h*';
is [unpack 'h', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a digit into the high nybble
is pack('H', $_),
chr hex($_) << 4,
'pack: 0x' . $_
for '0'..'9', 'a'..'f';
# packs 2 digits into a character
is pack('H*', '123'), "\x12\x30", 'pack: 2 digits per character';
# takes digits only from one argument
is pack('H2', '1', '2'), "\x10", 'pack: extra arg';
is pack('H'), "\x00", 'pack: no arg';
is pack('H2'), "\x00", 'pack: no arg: H2';
is pack('H', ''), "\x00", 'pack: empty arg';
is pack('H2', ''), "\x00", 'pack: empty arg: H2';
# takes chr(x) as x % 16, but there are 2 special cases
is pack('H', $_),
chr((
ord 'a' <= ord $_ <= ord 'z' ? (ord($_) - ord('a') + 10) % 16
: ord 'A' <= ord $_ <= ord 'Z' ? (ord($_) - ord('A') + 10) % 16
: ord($_) % 16
) << 4),
'pack: '. px ord, '06'
for chr int rand 0x110000;
is pack('H0', ''), '', 'pack: H0';
is pack('H', '1'), "\x10", 'pack: H';
is pack('H1', '1'), "\x10", 'pack: H1';
is pack('H2', '12'), "\x12", 'pack: H2';
is pack('H*', '123'), "\x12\x30", 'pack: H*';
is pack('H', '12'), "\x10", 'pack: extra digit';
# unpacks a digit from the high nybble
is [unpack 'H'],
[sprintf '%x', ord >> 4],
'unpack: ' . px ord, '02'
for map chr $_ << 4, 0..15;
# unpacks 2 digits from a character
is [unpack 'H*', "\x12\x30"],
['1230'],
'unpack: 2 digits per character';
# unpacks separate characters into separate values
is [unpack 'HH', "\x10\x20"], ['1', '2'], 'unpack: HH';
is [unpack 'H0', ''], [''], 'unpack: H0';
is [unpack 'H', "\x10"], ['1'], 'unpack: H';
is [unpack 'H1', "\x10"], ['1'], 'unpack: H1';
is [unpack 'H2', "\x12"], ['12'], 'unpack: H2';
is [unpack 'H*', "\x12\x30"], ['1230'], 'unpack: H*';
is [unpack 'H', ''], [''], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a signed char
is pack('c', $_),
chr($_ & 0xff),
'pack: ' . px $_, '02'
for -0x80 + int rand 0x80;
is pack('c', $_),
chr,
'pack: ' . px $_, '02'
for int rand 0x80;
# one argument, one character
is pack('c2', 1, 2), "\x01\x02", 'pack: c2';
is pack('c'), "\x00", 'pack: no arg';
is pack('c2'), "\x00\x00", 'pack: no arg: c2';
is pack('c0'), '', 'pack: c0';
is pack('c', 1), "\x01", 'pack: c';
is pack('c1', 1), "\x01", 'pack: c1';
is pack('c*', 1, 2, 3), "\x01\x02\x03", 'pack: c*';
is pack('c', 1, 2), "\x01", 'pack: extra arg';
# unpacks a signed char
is [unpack 'c'],
[ord($_) - 0x100],
'unpack: ' . px ord, '02'
for chr((-0x80 + int rand 0x80) & 0xff);
is [unpack 'c'],
[ord],
'unpack: ' . px ord, '02'
for chr int rand 0x80;
# unpacks separate characters into separate values
is [unpack 'cc', "\x01\x02"], [1, 2], 'unpack: cc';
is [unpack 'c0', ''], [], 'unpack: c0';
is [unpack 'c', "\x01"], [1], 'unpack: c';
is [unpack 'c1', "\x01"], [1], 'unpack: c1';
is [unpack 'c2', "\x01\x02"], [1, 2], 'unpack: c2';
is [unpack 'c*', "\x01\x02\x03"], [1, 2, 3], 'unpack: c*';
is [unpack 'c', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned char
is pack('C', $_),
chr,
'pack: ' . px $_, '02'
for int rand 0x100;
# one argument, one character
is pack('C2', 1, 2), "\x01\x02", 'pack: C2';
is pack('C'), "\x00", 'pack: no arg';
is pack('C2'), "\x00\x00", 'pack: no arg: C2';
is pack('C0'), '', 'pack: C0';
is pack('C', 1), "\x01", 'pack: C';
is pack('C1', 1), "\x01", 'pack: C1';
is pack('C*', 1, 2, 3), "\x01\x02\x03", 'pack: C*';
is pack('C', 1, 2), "\x01", 'pack: extra arg';
is pack('c', -1), pack('C', 0xff), 'pack: c vs C';
# unpacks an unsigned char
is [unpack 'C'],
[ord],
'unpack: ' . px ord, '02'
for chr int rand 0x100;
# unpacks separate characters into separate values
is [unpack 'CC', "\x01\x02"], [1, 2], 'unpack: CC';
is [unpack 'C0', ''], [], 'unpack: C0';
is [unpack 'C', "\x01"], [1], 'unpack: C';
is [unpack 'C1', "\x01"], [1], 'unpack: C1';
is [unpack 'C2', "\x01\x02"], [1, 2], 'unpack: C2';
is [unpack 'C*', "\x01\x02\x03"], [1, 2, 3], 'unpack: C*';
is [unpack 'C', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a code point
is pack('W', $_),
chr,
'pack: ' . px $_, '06'
for int rand 0x110000;
# one argument, one character
is pack('W2', 1, 2), "\x01\x02", 'pack: W2';
is pack('W'), "\x00", 'pack: no arg';
is pack('W2'), "\x00\x00", 'pack: no arg: W2';
is pack('W0'), '', 'pack: W0';
is pack('W', 1), "\x01", 'pack: W';
is pack('W1', 1), "\x01", 'pack: W1';
is pack('W*', 1, 2, 3), "\x01\x02\x03", 'pack: W*';
is pack('W', 1, 2), "\x01", 'pack: extra arg';
# unpacks a code point
is [unpack 'W'],
[ord],
'unpack: ' . px ord, '06'
for chr int rand 0x110000;
# unpacks separate characters into separate values
is [unpack 'WW', "\x01\x02"], [1, 2], 'unpack: WW';
is [unpack 'W0', ''], [], 'unpack: W0';
is [unpack 'W', "\x01"], [1], 'unpack: W';
is [unpack 'W1', "\x01"], [1], 'unpack: W1';
is [unpack 'W2', "\x01\x02"], [1, 2], 'unpack: W2';
is [unpack 'W*', "\x01\x02\x03"], [1, 2, 3], 'unpack: W*';
is [unpack 'W', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a signed short
is pack('s', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff),
'pack: ' . px $_, '04'
for -0x8000 + int rand 0x8000;
is pack('s', $_),
chr($_ & 0xff)
. chr($_ >> 8 ),
'pack: ' . px $_, '04'
for int rand 0x8000;
# one argument, 2 characters
is pack('s2', 1, 2), "\x01\x00\x02\x00", 'pack: s2';
is pack('s'), "\x00\x00", 'pack: no arg';
is pack('s2'), "\x00\x00\x00\x00", 'pack: no arg: s2';
is pack('s0'), '', 'pack: s0';
is pack('s', 1), "\x01\x00", 'pack: s';
is pack('s1', 1), "\x01\x00", 'pack: s1';
is pack('s*', 1, 2, 3), "\x01\x00\x02\x00"
. "\x03\x00", 'pack: s*';
is pack('s', 1, 2), "\x01\x00", 'pack: extra arg';
# unpacks a signed short
is [unpack 's'],
[sub {
my $s = shift;
(ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
- 0x10000
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff);
}->(-0x8000 + int rand 0x8000);
is [unpack 's'],
[sub {
my $s = shift;
(ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 );
}->(int rand 0x8000);
# unpacks each 2 characters into a separate value
is [unpack 'ss', "\x01\x00\x02\x00"], [1, 2], 'unpack: ss';
is [unpack 's0', ''], [], 'unpack: s0';
is [unpack 's', "\x01\x00"], [1], 'unpack: s';
is [unpack 's1', "\x01\x00"], [1], 'unpack: s1';
is [unpack 's2', "\x01\x00\x02\x00"], [1, 2], 'unpack: s2';
is [unpack 's*', "\x01\x00\x02\x00"
. "\x03\x00"], [1, 2, 3], 'unpack: s*';
is [unpack 's', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned short
is pack('S', $_),
chr($_ & 0xff)
. chr($_ >> 8 ),
'pack: ' . px $_, '04'
for int rand 0x10000;
# one argument, 2 characters
is pack('S2', 1, 2), "\x01\x00\x02\x00", 'pack: S2';
is pack('S'), "\x00\x00", 'pack: no arg';
is pack('S2'), "\x00\x00\x00\x00", 'pack: no arg: S2';
is pack('S0'), '', 'pack: S0';
is pack('S', 1), "\x01\x00", 'pack: S';
is pack('S1', 1), "\x01\x00", 'pack: S1';
is pack('S*', 1, 2, 3), "\x01\x00\x02\x00"
. "\x03\x00", 'pack: S*';
is pack('S', 1, 2), "\x01\x00", 'pack: extra arg';
is pack('s', -1), pack('S', 0xffff), 'pack: s vs S';
# unpacks an unsigned short
is [unpack 'S'],
[sub {
my $s = shift;
(ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 );
}->(int rand 0x10000);
# unpacks each 2 characters into a separate value
is [unpack 'SS', "\x01\x00\x02\x00"], [1, 2], 'unpack: SS';
is [unpack 'S0', ''], [], 'unpack: S0';
is [unpack 'S', "\x01\x00"], [1], 'unpack: S';
is [unpack 'S1', "\x01\x00"], [1], 'unpack: S1';
is [unpack 'S2', "\x01\x00\x02\x00"], [1, 2], 'unpack: S2';
is [unpack 'S*', "\x01\x00\x02\x00"
. "\x03\x00"], [1, 2, 3], 'unpack: S*';
is [unpack 'S', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a signed long
is pack('l', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 & 0xff),
'pack: ' . px $_, '08'
for -0x8000_0000 + int rand 0x8000_0000;
is pack('l', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 ),
'pack: ' . px $_, '08'
for int rand 0x8000_0000;
# one argument, 4 characters
is pack('l2', 1, 2), "\x01\x00\x00\x00"
. "\x02\x00\x00\x00", 'pack: l2';
is pack('l'), "\x00\x00\x00\x00", 'pack: no arg';
is pack('l2'), "\x00\x00\x00\x00"
. "\x00\x00\x00\x00", 'pack: no arg: l2';
is pack('l0'), '', 'pack: l0';
is pack('l', 1), "\x01\x00\x00\x00", 'pack: l';
is pack('l1', 1), "\x01\x00\x00\x00", 'pack: l1';
is pack('l*', 1, 2, 3), "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"
. "\x03\x00\x00\x00", 'pack: l*';
is pack('l', 1, 2), "\x01\x00\x00\x00", 'pack: extra arg';
# unpacks a signed long
is [unpack 'l'],
[sub {
my $s = shift;
(ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
- (1 << 32)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 & 0xff);
}->(-0x8000_0000 + int rand 0x8000_0000);
is [unpack 'l'],
[sub {
my $s = shift;
(ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 );
}->(int rand 0x8000_0000);
# unpacks each 4 characters into a separate value
is [unpack 'll', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"], [1, 2], 'unpack: ll';
is [unpack 'l0', ''], [], 'unpack: l0';
is [unpack 'l', "\x01\x00\x00\x00"], [1], 'unpack: l';
is [unpack 'l1', "\x01\x00\x00\x00"], [1], 'unpack: l1';
is [unpack 'l2', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"], [1, 2], 'unpack: l2';
is [unpack 'l*', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"
. "\x03\x00\x00\x00"], [1, 2, 3], 'unpack: l*';
is [unpack 'l', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned long
is pack('L', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 ),
'pack: ' . px $_, '08'
for int rand 1 << 32;
# one argument, 4 characters
is pack('L2', 1, 2), "\x01\x00\x00\x00"
. "\x02\x00\x00\x00", 'pack: L2';
is pack('L'), "\x00\x00\x00\x00", 'pack: no arg';
is pack('L2'), "\x00\x00\x00\x00"
. "\x00\x00\x00\x00", 'pack: no arg: L2';
is pack('L0'), '', 'pack: L0';
is pack('L', 1), "\x01\x00\x00\x00", 'pack: L';
is pack('L1', 1), "\x01\x00\x00\x00", 'pack: L1';
is pack('L*', 1, 2, 3), "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"
. "\x03\x00\x00\x00", 'pack: L*';
is pack('L', 1, 2), "\x01\x00\x00\x00", 'pack: extra arg';
is pack('l', -1), pack('L', 0xffff_ffff), 'pack: l vs L';
# unpacks an unsigned long
is [unpack 'L'],
[sub {
my $s = shift;
(ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 );
}->(int rand 1 << 32);
# unpacks each 4 characters into a separate value
is [unpack 'LL', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"], [1, 2], 'unpack: LL';
is [unpack 'L0', ''], [], 'unpack: L0';
is [unpack 'L', "\x01\x00\x00\x00"], [1], 'unpack: L';
is [unpack 'L1', "\x01\x00\x00\x00"], [1], 'unpack: L1';
is [unpack 'L2', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"], [1, 2], 'unpack: L2';
is [unpack 'L*', "\x01\x00\x00\x00"
. "\x02\x00\x00\x00"
. "\x03\x00\x00\x00"], [1, 2, 3], 'unpack: L*';
is [unpack 'L', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a signed quad
is pack('q', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 & 0xff)
. chr($_ >> 32 & 0xff)
. chr($_ >> 40 & 0xff)
. chr($_ >> 48 & 0xff)
. chr($_ >> 56 ),
'pack: ' . px $_, '016'
for -(1 << 63) + int rand 1 << 63;
is pack('q', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 & 0xff)
. chr($_ >> 32 & 0xff)
. chr($_ >> 40 & 0xff)
. chr($_ >> 48 & 0xff)
. chr($_ >> 56 ),
'pack: ' . px $_, '016'
for int rand 1 << 63;
# one argument, 8 characters
is pack('q2', 1, 2), "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: q2';
is pack('q'), "\x00\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: no arg';
is pack('q2'), "\x00\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x00\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: no arg: q2';
is pack('q0'), '', 'pack: q0';
is pack('q', 1), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: q';
is pack('q1', 1), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: q1';
is pack('q*', 1, 2, 3), "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x03\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: q*';
is pack('q', 1, 2), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: extra arg';
# unpacks a signed quad
is [unpack 'q'],
[sub {
my $s = shift;
((
(ord(substr $s, 7, 1) << 56)
+ (ord(substr $s, 6, 1) << 48)
+ (ord(substr $s, 5, 1) << 40)
+ (ord(substr $s, 4, 1) << 32)
+ (ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
) & ~(1 << 63)) - (1 << 63)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 & 0xff)
. chr($i >> 32 & 0xff)
. chr($i >> 40 & 0xff)
. chr($i >> 48 & 0xff)
. chr($i >> 56 );
}->(-(1 << 63) + int rand 1 << 63);
is [unpack 'q'],
[sub {
my $s = shift;
(ord(substr $s, 7, 1) << 56)
+ (ord(substr $s, 6, 1) << 48)
+ (ord(substr $s, 5, 1) << 40)
+ (ord(substr $s, 4, 1) << 32)
+ (ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 & 0xff)
. chr($i >> 32 & 0xff)
. chr($i >> 40 & 0xff)
. chr($i >> 48 & 0xff)
. chr($i >> 56 );
}->(int rand 1 << 63);
# unpacks each 8 characters into a separate value
is [unpack 'qq', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2], 'unpack: qq';
is [unpack 'q0', ''], [], 'unpack: q0';
is [unpack 'q', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"], [1], 'unpack: q';
is [unpack 'q1', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"], [1], 'unpack: q1';
is [unpack 'q2', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2], 'unpack: q2';
is [unpack 'q*', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x03\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2, 3], 'unpack: q*';
is [unpack 'q', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned quad
is pack('Q', $_),
chr($_ & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ >> 16 & 0xff)
. chr($_ >> 24 & 0xff)
. chr($_ >> 32 & 0xff)
. chr($_ >> 40 & 0xff)
. chr($_ >> 48 & 0xff)
. chr($_ >> 56 ),
'pack: ' . px $_, '016'
for int rand 1 << 64;
# one argument, 8 characters
is pack('Q2', 1, 2), "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: Q2';
is pack('Q'), "\x00\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: no arg';
is pack('Q2'), "\x00\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x00\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: no arg: Q2';
is pack('Q0'), '', 'pack: Q0';
is pack('Q', 1), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: Q';
is pack('Q1', 1), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: Q1';
is pack('Q*', 1, 2, 3), "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x03\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: Q*';
is pack('Q', 1, 2), "\x01\x00\x00\x00" . "\x00\x00\x00\x00", 'pack: extra arg';
is pack('q', -1), pack('Q', ~0), 'pack: q vs Q';
# unpacks an unsigned quad
is [unpack 'Q'],
[sub {
my $s = shift;
(ord(substr $s, 7, 1) << 56)
+ (ord(substr $s, 6, 1) << 48)
+ (ord(substr $s, 5, 1) << 40)
+ (ord(substr $s, 4, 1) << 32)
+ (ord(substr $s, 3, 1) << 24)
+ (ord(substr $s, 2, 1) << 16)
+ (ord(substr $s, 1, 1) << 8)
+ ord(substr $s, 0, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i & 0xff)
. chr($i >> 8 & 0xff)
. chr($i >> 16 & 0xff)
. chr($i >> 24 & 0xff)
. chr($i >> 32 & 0xff)
. chr($i >> 40 & 0xff)
. chr($i >> 48 & 0xff)
. chr($i >> 56 );
}->(int rand 1 << 64);
# unpacks each 8 characters into a separate value
is [unpack 'QQ', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2], 'unpack: QQ';
is [unpack 'Q0', ''], [], 'unpack: Q0';
is [unpack 'Q', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"], [1], 'unpack: Q';
is [unpack 'Q1', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"], [1], 'unpack: Q1';
is [unpack 'Q2', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2], 'unpack: Q2';
is [unpack 'Q*', "\x01\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x02\x00\x00\x00" . "\x00\x00\x00\x00"
. "\x03\x00\x00\x00" . "\x00\x00\x00\x00"], [1, 2, 3], 'unpack: Q*';
is [unpack 'Q', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
# i == l
done_testing;
use Test2::V0;
# I == L
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned short (big-endian)
is pack('n', $_),
chr($_ >> 8 )
. chr($_ & 0xff),
'pack: ' . px $_, '04'
for int rand 0x10000;
# one argument, 2 characters
is pack('n2', 1, 2), "\x00\x01\x00\x02", 'pack: n2';
is pack('n'), "\x00\x00", 'pack: no arg';
is pack('n2'), "\x00\x00\x00\x00", 'pack: no arg: n2';
is pack('n0'), '', 'pack: n0';
is pack('n', 1), "\x00\x01", 'pack: n';
is pack('n1', 1), "\x00\x01", 'pack: n1';
is pack('n*', 1, 2, 3), "\x00\x01\x00\x02"
. "\x00\x03", 'pack: n*';
is pack('n', 1, 2), "\x00\x01", 'pack: extra arg';
# unpacks an unsigned short
is [unpack 'n'],
[sub {
my $s = shift;
(ord(substr $s, 0, 1) << 8)
+ ord(substr $s, 1, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i >> 8 )
. chr($i & 0xff);
}->(int rand 0x10000);
# unpacks each 2 characters into a separate value
is [unpack 'nn', "\x00\x01\x00\x02"], [1, 2], 'unpack: nn';
is [unpack 'n0', ''], [], 'unpack: n0';
is [unpack 'n', "\x00\x01"], [1], 'unpack: n';
is [unpack 'n1', "\x00\x01"], [1], 'unpack: n1';
is [unpack 'n2', "\x00\x01\x00\x02"], [1, 2], 'unpack: n2';
is [unpack 'n*', "\x00\x01\x00\x02"
. "\x00\x03"], [1, 2, 3], 'unpack: n*';
is [unpack 'n', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs an unsigned long (big-endian)
is pack('N', $_),
chr($_ >> 24 )
. chr($_ >> 16 & 0xff)
. chr($_ >> 8 & 0xff)
. chr($_ & 0xff),
'pack: ' . px $_, '08'
for int rand 1 << 32;
# one argument, 4 characters
is pack('N2', 1, 2), "\x00\x00\x00\x01"
. "\x00\x00\x00\x02", 'pack: N2';
is pack('N'), "\x00\x00\x00\x00", 'pack: no arg';
is pack('N2'), "\x00\x00\x00\x00"
. "\x00\x00\x00\x00", 'pack: no arg: N2';
is pack('N0'), '', 'pack: N0';
is pack('N', 1), "\x00\x00\x00\x01", 'pack: N';
is pack('N1', 1), "\x00\x00\x00\x01", 'pack: N1';
is pack('N*', 1, 2, 3), "\x00\x00\x00\x01"
. "\x00\x00\x00\x02"
. "\x00\x00\x00\x03", 'pack: N*';
is pack('N', 1, 2), "\x00\x00\x00\x01", 'pack: extra arg';
# unpacks an unsigned long (big-endian)
is [unpack 'N'],
[sub {
my $s = shift;
(ord(substr $s, 0, 1) << 24)
+ (ord(substr $s, 1, 1) << 16)
+ (ord(substr $s, 2, 1) << 8)
+ ord(substr $s, 3, 1)
}->($_)],
'unpack: ' . join ' ', map px(ord, '02'), split ''
for sub {
my $i = shift;
chr($i >> 24 )
. chr($i >> 16 & 0xff)
. chr($i >> 8 & 0xff)
. chr($i & 0xff);
}->(int rand 1 << 32);
# unpacks each 4 characters into a separate value
is [unpack 'NN', "\x00\x00\x00\x01"
. "\x00\x00\x00\x02"], [1, 2], 'unpack: NN';
is [unpack 'N0', ''], [], 'unpack: N0';
is [unpack 'N', "\x00\x00\x00\x01"], [1], 'unpack: N';
is [unpack 'N1', "\x00\x00\x00\x01"], [1], 'unpack: N1';
is [unpack 'N2', "\x00\x00\x00\x01"
. "\x00\x00\x00\x02"], [1, 2], 'unpack: N2';
is [unpack 'N*', "\x00\x00\x00\x01"
. "\x00\x00\x00\x02"
. "\x00\x00\x00\x03"], [1, 2, 3], 'unpack: N*';
is [unpack 'N', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
# v == S
done_testing;
use Test2::V0;
# V == L
done_testing;
use Test2::V0;
use Encode;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# packs a code point
is pack('U', $_),
chr,
'pack: ' . px $_, '06'
for int rand 0x110000;
# one argument, one character
is pack('U2', 1, 2), "\x01\x02", 'pack: U2';
is pack('U'), "\x00", 'pack: no arg';
is pack('U2'), "\x00\x00", 'pack: no arg: U2';
is pack('U0'), '', 'pack: U0';
is pack('U', 1), "\x01", 'pack: U';
is pack('U1', 1), "\x01", 'pack: U1';
is pack('U*', 1, 2, 3), "\x01\x02\x03", 'pack: U*';
is pack('U', 1, 2), "\x01", 'pack: extra arg';
# unpacks a code point
is [unpack 'U'],
[ord],
'unpack: ' . px ord, '06'
for chr int rand 0x110000;
# unpacks separate characters into separate values
is [unpack 'UU', "\x01\x02"], [1, 2], 'unpack: UU';
is [unpack 'U0', ''], [], 'unpack: U0';
is [unpack 'U', "\x01"], [1], 'unpack: U';
is [unpack 'U1', "\x01"], [1], 'unpack: U1';
is [unpack 'U2', "\x01\x02"], [1, 2], 'unpack: U2';
is [unpack 'U*', "\x01\x02\x03"], [1, 2, 3], 'unpack: U*';
is [unpack 'U', ''], [], 'unpack: not enough characters';
done_testing;
use Test2::V0;
# packs \x00
is pack('x'), "\x00", 'pack: x';
is pack('x0'), '', 'pack: x0';
is pack('x1'), "\x00", 'pack: x1';
is pack('x2'), "\x00\x00", 'pack: x2';
is pack('x*'), '', 'pack: x*'; # x* makes no sense
# skips a character
is [unpack 'xa', 'ab'], ['b'], 'unpack: x';
is dies { unpack 'x', '' },
match qr('x' outside of string),
'unpack: outside of string';
is [unpack 'x0a', 'a'], ['a'], 'unpack: x0';
is [unpack 'x1a', 'ab'], ['b'], 'unpack: x1';
is [unpack 'x2a', 'abc'], ['c'], 'unpack: x2';
is [unpack 'x*a', 'abcd'], [''], 'unpack: x*';
done_testing;
use Test2::V0;
# truncates back
is pack('aX', 'a'), '', 'pack: X';
# X* surprisingly does nothing
is pack('a3X*', 'abc'), 'abc', 'pack: X*';
like dies { pack 'X' },
qr('X' outside of string),
'pack: outside of string';
is pack('X0'), '', 'pack: X0';
is pack('aX1', 'a'), '', 'pack: X1';
is pack('a2X2', 'ab'), '', 'pack: X2';
# steps back
is [unpack 'aXa', 'a'], ['a', 'a'], 'unpack: X';
is dies { unpack 'X', '' },
match qr('X' outside of string),
'unpack: outside of string';
is [unpack 'X0a', 'a'], ['a'], 'unpack: X0';
is [unpack 'aX1a', 'a'], ['a', 'a'], 'unpack: X1';
is [unpack 'a2X2a', 'ab'], ['ab', 'a'], 'unpack: X2';
is [unpack 'a3X*a', 'abc'], ['abc', 'a'], 'unpack: X*';
done_testing;
use Test2::V0;
# truncates to an absolute position
is pack('a@0', 'a'), '', 'pack: truncate';
# null-fills to an absolute position
is pack('@1'), "\x00", 'pack: @1';
# the position is relative to the innermost (
is pack('a(a@0)', 'a', 'b'), 'a', 'pack: ()';
is pack('@0'), '', 'pack: @0';
is pack('@'), "\x00", 'pack: @';
is pack('@2'), "\x00\x00", 'pack: @2';
is pack('@*'), '', 'pack: @*'; # @* makes no sense
# moves back to an absolute position
is [unpack 'a@0a', 'a'], ['a', 'a'], 'unpack: move back';
# moves forward to an absolute position
is [unpack '@1a', 'ab'], ['b'], 'unpack: @1';
# the position is relative to the innermost (
is [unpack 'a(a@0a)', 'ab'], ['a', 'b', 'b'], 'unpack: ()';
is dies { unpack '@', '' },
match qr('@' outside of string),
'unpack: outside of string';
is [unpack '@0a', 'a'], ['a'], 'unpack: @0';
is [unpack '@a', 'ab'], ['b'], 'unpack: @';
is [unpack '@2a', 'abc'], ['c'], 'unpack: @2';
is [unpack '@*a', 'abcd'], [''], 'unpack: @*';
done_testing;
use Test2::V0;
# truncates to an absolute position
is pack('a.', 'a', 0), '', 'pack: truncate';
# null-fills to an absolute position
is pack('.', 1), "\x00", 'pack: .(1)';
# the position is relative to the innermost (
is pack('a(a.)', 'a', 'b', 0), 'a', 'pack: ()';
is pack('.', 0), '', 'pack: .(0)';
is pack('.', 2), "\x00\x00", 'pack: .(2)';
is pack('.'), '', 'pack: no arg ()';
is pack('a.', 'a'), '', 'pack: no arg (a)';
is pack('a2.', 'ab'), '', 'pack: no arg (ab)';
is pack('a.0', 'a', 0), 'a', 'pack: .0';
is pack('a.', 'a', 0), '', 'pack: .';
is pack('a.1', 'a', 0), '', 'pack: .1';
is pack('a.2', 'a', 0), '', 'pack: .2';
is pack('a.*', 'a', 0), '', 'pack: .*';
# unpacks 0
is [unpack '.', ''], [0], 'unpack: .';
is [unpack '.0', ''], [0], 'unpack: .0';
is [unpack '.1', ''], [0], 'unpack: .1';
is [unpack '.2', ''], [0], 'unpack: .2';
is [unpack '.*', ''], [0], 'unpack: .*';
done_testing;
use Test2::V0;
# sub px {
# my ($n, $f, $l) = @_;
# $f //= '';
# $l //= 'x';
# $n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
# }
# sub pb {
# px @_, 'b';
# }
sub resilient_warnings(&) {
my $code = shift;
warnings { my $r = dies { $code->() } };
}
sub resilient_warning(&) {
my $code = shift;
warning { my $r = dies { $code->() } };
}
# sub reverse_bits {
# oct '0b' . join '', reverse split '', sprintf '%08b', shift;
# }
#
# sub reverse_nybbles {
# my $b = shift;
# ($b << 4 & 0xff) + ($b >> 4);
# }
# 110xxxyy 10yyzzzz
# 11000010 10000000 = c2 80
# 000 1000 0000 = 0x80
# 11011111 10111111 = df bf
# 111 1111 1111 = 0x7ff
#
# 1110wwww 10xxxxyy 10yyzzzz
# 11100000 10100000 10000000 = e0 a0 80
# 0000 1000 0000 0000 = 0x800
# 11101111 10111111 10111111 = ef bf bf
# 1111 1111 1111 1111 = 0xffff
#
# 11110uvv 10vvwwww 10xxxxyy 10yyzzzz
# 11110010 10000000 10000000 10000000 = f2 80 80 80
# 0 1000 0000 0000 0000 0000 = 0x10000
# 11110100 10001111 10111111 10111111 = f4 8f bf bf
# 1 0000 1111 1111 1111 1111 = 0x10ffff
# a
is resilient_warning { pack 'U0a', "\x80" },
match qr(Malformed UTF-8 character: \\x80),
'U0a: pack: \x80';
is pack('U0a2', "\xdf\xbf"),
"\x{7ff}",
'U0a: pack: \xdf\xbf';
is [unpack 'U0a2', "\x{7ff}"],
["\xdf\xbf"],
'U0a: unpack: \x{7ff}';
# A
is resilient_warning { pack 'U0A', "\x80" },
match qr(Malformed UTF-8 character: \\x80),
'U0A: pack: \x80';
is pack('U0A2', "\xdf\xbf"),
"\x{7ff}",
'U0A: pack: \xdf\xbf';
is [unpack 'U0A2', "\x{7ff}"],
["\xdf\xbf"],
'U0A: unpack: \x{7ff}';
# Z
is resilient_warning { pack 'U0Z2', "\x80" },
match qr(Malformed UTF-8 character: \\x80),
'U0Z: pack: \x80';
is pack('U0Z3', "\xdf\xbf"),
"\x{7ff}\x00",
'U0Z: pack: \xdf\xbf';
is [unpack 'U0Z3', "\x{7ff}\x00"],
["\xdf\xbf"],
'U0Z: unpack: \x{7ff}\x00';
# b
is resilient_warning { pack 'U0b8', '0000' . '0001' },
match qr(Malformed UTF-8 character: \\x80),
'U0b: pack: 0000_0001';
is pack('U0b16', '1111' . '1011' . '1111' . '1101'), # fbfd
"\x{7ff}",
'U0b: pack: 1111_1011_1111_1101';
is [unpack 'U0b16', "\x{7ff}"],
['1111' . '1011' . '1111' . '1101'],
'U0b: unpack: \x{7ff}';
# B
is resilient_warning { pack 'U0B8', '1000' . '0000' },
match qr(Malformed UTF-8 character: \\x80),
'U0B: pack: 1000_0000';
is pack('U0B16', '1101' . '1111' . '1011' . '1111'), # dfbf
"\x{7ff}",
'U0B: pack: 1101_1111_1011_1111';
is [unpack 'U0B16', "\x{7ff}"],
['1101' . '1111' . '1011' . '1111'],
'U0B: unpack: \x{7ff}';
# h
is resilient_warning { pack 'U0h2', '08' },
match qr(Malformed UTF-8 character: \\x80),
'U0h: pack: 08';
is pack('U0h4', 'fdfb'),
"\x{7ff}",
'U0h: pack: fdfb';
is [unpack 'U0h4', "\x{7ff}"],
['fdfb'],
'U0h: unpack: \x{7ff}';
# H
is resilient_warning { pack 'U0H2', '80' },
match qr(Malformed UTF-8 character: \\x80),
'U0H: pack: 80';
is pack('U0H4', 'dfbf'),
"\x{7ff}",
'U0H: pack: dfbf';
is [unpack 'U0H4', "\x{7ff}"],
['dfbf'],
'U0H: unpack: \x{7ff}';
# c
is resilient_warning { pack 'U0c', 0x80 - 0x100 },
match qr(Malformed UTF-8 character: \\x80),
'U0c: pack: 0x80 - 0x100';
is pack('U0c2', 0xdf - 0x100, 0xbf - 0x100),
"\x{7ff}",
'U0c: pack: 0xdf - 0x100, 0xbf - 0x100';
is [unpack 'U0c2', "\x{7ff}"],
[0xdf - 0x100, 0xbf - 0x100],
'U0c: unpack: \x{7ff}';
# C
is resilient_warning { pack 'U0C', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0C: pack: 0x80';
is pack('U0C2', 0xdf, 0xbf),
"\x{7ff}",
'U0C: pack: 0xdf, 0xbf';
is [unpack 'U0C2', "\x{7ff}"],
[0xdf, 0xbf],
'U0C: unpack: \x{7ff}';
# W
is resilient_warning { pack 'U0W', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0W: pack: 0x80';
is pack('U0W2', 0xdf, 0xbf),
"\x{7ff}",
'U0W: pack: 0xdf, 0xbf';
is [unpack 'U0W2', "\x{7ff}"],
[0xdf, 0xbf],
'U0W: unpack: \x{7ff}';
# s
is resilient_warning { pack 'U0s', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0s: pack: 0x80';
is pack('U0s', 0xbfdf - 0x10000),
"\x{7ff}",
'U0s: pack: 0xbfdf - 0x10000';
is [unpack 'U0s', "\x{7ff}"],
[0xbfdf - 0x10000],
'U0s: unpack: \x{7ff}';
# S
is resilient_warning { pack 'U0S', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0S: pack: 0x80';
is pack('U0S', 0xbfdf),
"\x{7ff}",
'U0S: pack: 0xbfdf';
is [unpack 'U0S', "\x{7ff}"],
[0xbfdf],
'U0S: unpack: \x{7ff}';
# l
is resilient_warning { pack 'U0l', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0l: pack: 0x80';
is pack('U0l', 0xbfdf),
"\x{7ff}\x00\x00",
'U0l: pack: 0xbfdf';
is [unpack 'U0l', "\x{7ff}\x00\x00"],
[0xbfdf],
'U0l: unpack: \x{7ff}\x00\x00';
# L
is resilient_warning { pack 'U0L', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0L: pack: 0x80';
is pack('U0L', 0xbfdf),
"\x{7ff}\x00\x00",
'U0L: pack: 0xbfdf';
is [unpack 'U0L', "\x{7ff}\x00\x00"],
[0xbfdf],
'U0L: unpack: \x{7ff}\x00\x00';
# q
is resilient_warning { pack 'U0q', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0q: pack: 0x80';
is pack('U0q', 0xbfdf),
"\x{7ff}\x00\x00" . "\x00\x00\x00\x00",
'U0q: pack: 0xbfdf';
is [unpack 'U0q', "\x{7ff}\x00\x00" . "\x00\x00\x00\x00"],
[0xbfdf],
'U0q: unpack: \x{7ff}\x00\x00 \x00\x00\x00\x00';
# Q
is resilient_warning { pack 'U0Q', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0Q: pack: 0x80';
is pack('U0Q', 0xbfdf),
"\x{7ff}\x00\x00" . "\x00\x00\x00\x00",
'U0Q: pack: 0xbfdf';
is [unpack 'U0Q', "\x{7ff}\x00\x00" . "\x00\x00\x00\x00"],
[0xbfdf],
'U0Q: unpack: \x{7ff}\x00\x00 \x00\x00\x00\x00';
# i == l
# I == L
# n
is resilient_warning { pack 'U0n', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0n: pack: 0x80';
is pack('U0n', 0xdfbf),
"\x{7ff}",
'U0n: pack: 0xdfbf';
is [unpack 'U0n', "\x{7ff}"],
[0xdfbf],
'U0n: unpack: \x{7ff}';
# N
is resilient_warning { pack 'U0N', 0x80 },
match qr(Malformed UTF-8 character: \\x80),
'U0N: pack: 0x80';
is pack('U0N', 0xdfbf),
"\x00\x00\x{7ff}",
'U0N: pack: 0xdfbf';
is [unpack 'U0N', "\x00\x00\x{7ff}"],
[0xdfbf],
'U0N: unpack: \x00\x00\x{7ff}';
# v == S
# V == L
# U
is pack('U', 0x80), "\x80", 'U: pack: 0x80';
is [unpack 'U', "\x80"], [0x80], 'U: unpack: \x80';
# x
is [unpack 'U0xa', "\x{7ff}"], ["\xbf"], 'U0xa: unpack: \x{7ff}';
# X
is resilient_warning { pack 'U0a2X', "\xdf\xbf" },
match qr(Malformed UTF-8 character: \\xdf),
'U0a2X: pack: \xdf\xbf';
is [unpack 'U0a2Xa', "\x{7ff}"],
["\xdf\xbf", "\xbf"],
'U0a2Xa: unpack: \x{7ff}';
# @
is resilient_warning { pack 'U0a2@1', "\xdf\xbf" },
match qr(Malformed UTF-8 character: \\xdf),
'U0a2@1: pack: \xdf\xbf';
is [unpack 'U0a2@1a', "\x{7ff}"],
["\xdf\xbf", "\xbf"],
'U0a2@1a: unpack: \x{7ff}';
# .
is resilient_warning { pack 'U0a2.', "\xdf\xbf", 1 },
match qr(Malformed UTF-8 character: \\xdf),
'U0a2.: pack: \xdf\xbf, 1';
# the resulting string must be valid utf8
is pack('U0aXac', "\x80", "\xdf", 0xbf - 0x100), "\x{7ff}", 'U0: what matters is the end state';
# U0 mode is selected implicitly when TEMPLATE starts with U
is pack('Ua2', 0x7ff, "\xdf\xbf"), "\x{7ff}\x{7ff}", 'implicit U0 mode';
is pack('aU', "\x80", 0x7ff), "\x80\xdf\xbf", 'implicit C0 mode';
done_testing;
use Test2::V0;
use Encode;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
# 110xxxyy 10yyzzzz
# 11000010 10000000 = c2 80
# 000 1000 0000 = 0x80
# U
is pack('C0U', 0x80), "\xc2\x80", 'C0U: 0x80';
is [unpack 'C0U', "\xc2\x80"], [0x80], 'C0U: unpack: \xc2\x80';
# C0U encodes some code points as is, which encode() encodes as 0xfffd (replacement character):
# * surrogates (0xd800..0xdfff)
# * 0xfdd0..0xfdef
# * ($_ << 16) + 0xfffe..($_ << 16) + 0xffff for 0..16
is pack('C0U', $_),
encode('UTF-8', chr),
'pack: C0U: ' . px $_, '06'
for sub {
my $n = shift;
return $n if $n < 0xd800;
$n += 0xe000 - 0xd800;
return $n if $n < 0xfdd0;
$n += 0xfdf0 - 0xfdd0;
return $n if $n < 0xfffe;
$n += 0x10000 - 0xfffe;
for my $p (0..16) {
return $n if $n < ($p << 16) + 0xfffe;
$n += ($p << 16) - (($p << 16) + 0xfffe);
}
}->(int rand 0x110000 - (0xe000 - 0xd800 + 0xfdf0 - 0xfdd0 + (0x10000 - 0xfffe) * 17));
done_testing;
use Test2::V0;
sub px {
my ($n, $f, $l) = @_;
$f //= '';
$l //= 'x';
$n < 0 ? sprintf("-0${l}%${f}${l}", -$n) : sprintf("0${l}%${f}${l}", $n);
}
sub resilient_warnings(&) {
my $code = shift;
warnings { my $r = dies { $code->() } };
}
sub resilient_warning(&) {
my $code = shift;
warning { my $r = dies { $code->() } };
}
# U0a wraps values not in the range 0..0xff
is pack('U0a', $_),
$_,
'U0a: wrap: ' . px ord, '02'
for "\x00";
is resilient_warning { pack 'U0a', $_ },
match qr(Malformed UTF-8 character: \\xff),
'U0a: wrap: ' . px ord, '02'
for "\xff";
is warning { pack 'U0a', $_ },
match qr(Character\(s\) in 'a' format wrapped),
'U0a: wrap: ' . px ord, '02'
for "\x{100}";
# U0A wraps values not in the range 0..0xff
is pack('U0A', $_),
$_,
'U0A: wrap: ' . px ord, '02'
for "\x00";
is resilient_warning { pack 'U0A', $_ },
match qr(Malformed UTF-8 character: \\xff),
'U0A: wrap: ' . px ord, '02'
for "\xff";
is warning { pack 'U0A', $_ },
match qr(Character\(s\) in 'A' format wrapped),
'U0A: wrap: ' . px ord, '02'
for "\x{100}";
# U0Z wraps values not in the range 0..0xff
is pack('U0Z2', $_),
$_ . "\x00",
'U0Z: wrap: ' . px ord, '02'
for "\x00";
is resilient_warnings { pack 'U0Z2', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\x00),
match qr(Malformed UTF-8 character: \\xff\\x00),
],
'U0Z: wrap: ' . px ord, '02'
for "\xff";
is warning { pack 'U0Z2', $_ },
match qr(Character\(s\) in 'Z' format wrapped),
'U0Z: wrap: ' . px ord, '02'
for "\x{100}";
# b (unpack) wraps characters > 0xff
is warning { unpack 'b', "\x{100}" },
match qr(Character in 'b' format wrapped),
'b: unpack: wrap: 0x100';
# B (unpack) wraps characters > 0xff
is warning { unpack 'B', "\x{100}" },
match qr(Character in 'B' format wrapped),
'B: unpack: wrap: 0x100';
# h (unpack) wraps characters > 0xff
is warning { unpack 'h', "\x{100}" },
match qr(Character in 'h' format wrapped),
'h: unpack: wrap: 0x100';
# H (unpack) wraps characters > 0xff
is warning { unpack 'H', "\x{100}" },
match qr(Character in 'H' format wrapped),
'H: unpack: wrap: 0x100';
# c wraps values not in the range -0x80..0x7f
is warning { pack 'c', $_ },
match qr(Character in 'c' format wrapped),
'c: wrap: ' . px $_, '02'
for -0x81;
is pack('c', $_), "\x80", 'c: wrap: ' . px $_, '02' for -0x80;
is pack('c', $_), "\x7f", 'c: wrap: ' . px $_, '02' for 0x7f;
is warning { pack 'c', $_ },
match qr(Character in 'c' format wrapped),
'c: wrap: ' . px $_, '02'
for 0x80;
# U0c wraps values not in the range -0x80..0x7f
is warning { pack 'U0c', $_ },
match qr(Character in 'c' format wrapped),
'U0c: wrap: ' . px $_, '02'
for -0x81;
is resilient_warning { pack 'U0c', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0c: wrap: ' . px $_, '02'
for -0x80;
is pack('U0c', $_), "\x7f", 'U0c: wrap: ' . px $_, '02' for 0x7f;
is resilient_warnings { pack 'U0c', $_ }, [
match qr(Character in 'c' format wrapped),
match qr(Malformed UTF-8 character: \\x80),
],
'U0c: wrap: ' . px $_, '02'
for 0x80;
# C wraps values not in the range 0..0xff
is warning { pack 'C', $_ },
match qr(Character in 'C' format wrapped),
'C: wrap: ' . px $_, '02'
for -0x01;
is pack('C', $_), "\x00", 'C: wrap: ' . px $_, '02' for 0x00;
is pack('C', $_), "\xff", 'C: wrap: ' . px $_, '02' for 0xff;
is warning { pack 'C', $_ },
match qr(Character in 'C' format wrapped),
'C: wrap: ' . px $_, '02'
for 0x100;
# U0C wraps values not in the range 0..0xff
is resilient_warnings { pack 'U0C', $_ }, [
match qr(Character in 'C' format wrapped),
match qr(Malformed UTF-8 character: \\xff),
],
'U0C: wrap: ' . px $_, '02'
for -0x01;
is pack('U0C', $_), "\x00", 'U0C: wrap: ' . px $_, '02' for 0x00;
is resilient_warning { pack 'U0C', $_ },
match qr(Malformed UTF-8 character: \\xff),
'U0C: wrap: ' . px $_, '02'
for 0xff;
is warning { pack 'U0C', $_ },
match qr(Character in 'C' format wrapped),
'U0C: wrap: ' . px $_, '02'
for 0x100;
# W doesn't produce wrap messages
is dies { pack 'W', $_ },
match qr(Use of code point 0xFFFFFFFFFFFFFFFF is not allowed; the permissible max is 0x7FFFFFFFFFFFFFFF),
'W: wrap: ' . px $_, '06'
for -0x000001;
is pack('W', $_), "\x00", 'W: wrap: ' . px $_, '06' for 0x000000;
is pack('W', $_), "\x{10ffff}", 'W: wrap: ' . px $_, '06' for 0x10ffff;
is pack('W', $_), "\x{110000}", 'W: wrap: ' . px $_, '06' for 0x110000;
# U0W wraps values not in the range 0..0xff
is resilient_warnings { pack 'U0W', $_ }, [
match qr(Character in 'W' format wrapped),
match qr(Malformed UTF-8 character: \\xff),
],
'U0W: wrap: ' . px $_, '02'
for -0x01;
is pack('U0W', $_), "\x00", 'U0W: wrap: ' . px $_, '02' for 0x00;
is resilient_warnings { pack 'U0W', $_ }, [
match qr(Malformed UTF-8 character: \\xff),
],
'U0W: wrap: ' . px $_, '02'
for 0xff;
is warning { pack 'U0W', $_ },
match qr(Character in 'W' format wrapped),
'U0W: wrap: ' . px $_, '02'
for 0x100;
# s doesn't produce wrap messages
is pack('s', $_), "\xff\x7f", 's: wrap: ' . px $_, '04' for -0x8001;
is pack('s', $_), "\x00\x80", 's: wrap: ' . px $_, '04' for -0x8000;
is pack('s', $_), "\xff\x7f", 's: wrap: ' . px $_, '04' for 0x7fff;
is pack('s', $_), "\x00\x80", 's: wrap: ' . px $_, '04' for 0x8000;
# U0s doesn't produce wrap messages
is resilient_warnings { pack 'U0s', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\x7f),
match qr(Malformed UTF-8 character: \\xff\\x7f),
],
'U0s: wrap: ' . px $_, '04'
for -0x8001;
is resilient_warning { pack 'U0s', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0s: wrap: ' . px $_, '04'
for -0x8000;
is resilient_warnings { pack 'U0s', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\x7f),
match qr(Malformed UTF-8 character: \\xff\\x7f),
],
'U0s: wrap: ' . px $_, '04'
for 0x7fff;
is resilient_warning { pack 'U0s', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0s: wrap: ' . px $_, '04'
for 0x8000;
# S doesn't produce wrap messages
is pack('S', $_), "\xff\xff", 'S: wrap: ' . px $_, '04' for -0x0001;
is pack('S', $_), "\x00\x00", 'S: wrap: ' . px $_, '04' for 0x0000;
is pack('S', $_), "\xff\xff", 'S: wrap: ' . px $_, '04' for 0xffff;
is pack('S', $_), "\x00\x00", 'S: wrap: ' . px $_, '04' for 0x10000;
# U0S doesn't produce wrap messages
is resilient_warnings { pack 'U0S', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff),
],
'U0S: wrap: ' . px $_, '04'
for -0x0001;
is pack('U0S', $_), "\x00\x00", 'U0S: wrap: ' . px $_, '04' for 0x0000;
is resilient_warnings { pack 'U0S', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff),
],
'U0S: wrap: ' . px $_, '04'
for 0xffff;
is pack('U0S', $_), "\x00\x00", 'U0S: wrap: ' . px $_, '04' for 0x10000;
# l doesn't produce wrap messages
is pack('l', $_), "\xff\xff\xff\x7f", 'l: wrap: ' . px $_, '08' for -0x8000_0001;
is pack('l', $_), "\x00\x00\x00\x80", 'l: wrap: ' . px $_, '08' for -0x8000_0000;
is pack('l', $_), "\xff\xff\xff\x7f", 'l: wrap: ' . px $_, '08' for 0x7fff_ffff;
is pack('l', $_), "\x00\x00\x00\x80", 'l: wrap: ' . px $_, '08' for 0x8000_0000;
# U0l doesn't produce wrap messages
is resilient_warnings { pack 'U0l', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\x7f),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\x7f),
],
'U0l: wrap: ' . px $_, '08'
for -0x8000_0001;
is resilient_warning { pack 'U0l', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0l: wrap: ' . px $_, '08'
for -0x8000_0000;
is resilient_warnings { pack 'U0l', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\x7f),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\x7f),
],
'U0l: wrap: ' . px $_, '08'
for 0x7fff_ffff;
is resilient_warning { pack 'U0l', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0l: wrap: ' . px $_, '08'
for 0x8000_0000;
# L doesn't produce wrap messages
is pack('L', $_), "\xff\xff\xff\xff", 'L: wrap: ' . px $_, '08' for -0x0000_0001;
is pack('L', $_), "\x00\x00\x00\x00", 'L: wrap: ' . px $_, '08' for 0x0000_0000;
is pack('L', $_), "\xff\xff\xff\xff", 'L: wrap: ' . px $_, '08' for 0xffff_ffff;
is pack('L', $_), "\x00\x00\x00\x00", 'L: wrap: ' . px $_, '08' for 1 << 32;
# U0L doesn't produce wrap messages
is resilient_warnings { pack 'U0L', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
],
'U0L: wrap: ' . px $_, '08'
for -0x0000_0001;
is pack('U0L', $_), "\x00\x00\x00\x00", 'U0L: wrap: ' . px $_, '08' for 0x0000_0000;
is resilient_warnings { pack 'U0L', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
],
'U0L: wrap: ' . px $_, '08'
for 0xffff_ffff;
is pack('U0L', $_), "\x00\x00\x00\x00", 'U0L: wrap: ' . px $_, '08' for 1 << 32;
# q doesn't produce wrap messages
is pack('q', $_),
"\x00\x00\x00\x00\x00\x00\x00\x80",
'q: wrap: ' . px $_, '016'
for -(1 << 63); # -0x8000_0000_0000_0000
is pack('q', $_),
"\xff\xff\xff\xff\xff\xff\xff\x7f",
'q: wrap: ' . px $_, '016'
for (1 << 63) - 1; # 0x7fff_ffff_ffff_ffff
is pack('q', $_),
"\x00\x00\x00\x00\x00\x00\x00\x80",
'q: wrap: ' . px $_, '016'
for 1 << 63; # 0x8000_0000_0000_0000
# U0q doesn't produce wrap messages
is resilient_warning { pack 'U0q', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0q: wrap: ' . px $_, '016'
for -(1 << 63); # -0x8000_0000_0000_0000
is resilient_warnings { pack 'U0q', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x7f),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x7f),
],
'U0q: wrap: ' . px $_, '016'
for (1 << 63) - 1; # 0x7fff_ffff_ffff_ffff
is resilient_warning { pack 'U0q', $_ },
match qr(Malformed UTF-8 character: \\x80),
'U0q: wrap: ' . px $_, '016'
for 1 << 63; # 0x8000_0000_0000_0000
# Q doesn't produce wrap messages
is pack('Q', $_),
"\xff\xff\xff\xff\xff\xff\xff\xff",
'Q: wrap: ' . px $_, '016'
for -1; # -0x0000_0000_0000_0001
is pack('Q', $_),
"\x00\x00\x00\x00\x00\x00\x00\x00",
'Q: wrap: ' . px $_, '016'
for 0; # 0x0000_0000_0000_0000
is pack('Q', $_),
"\xff\xff\xff\xff\xff\xff\xff\xff",
'Q: wrap: ' . px $_, '016'
for ~0; # 0xffff_ffff_ffff_ffff
# U0Q doesn't produce wrap messages
is resilient_warnings { pack 'U0Q', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff),
],
'Q: wrap: ' . px $_, '016'
for -1;
is pack('U0Q', $_),
"\x00\x00\x00\x00\x00\x00\x00\x00",
'Q: wrap: ' . px $_, '016'
for 0;
is resilient_warnings { pack 'U0Q', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff),
],
'Q: wrap: ' . px $_, '016'
for ~0;
# i == l
# I == L
# n doesn't produce wrap messages
is pack('n', $_), "\xff\xff", 'n: wrap: ' . px $_, '04' for -0x0001;
is pack('n', $_), "\x00\x00", 'n: wrap: ' . px $_, '04' for 0x0000;
is pack('n', $_), "\xff\xff", 'n: wrap: ' . px $_, '04' for 0xffff;
is pack('n', $_), "\x00\x00", 'n: wrap: ' . px $_, '04' for 0x10000;
# U0n doesn't produce wrap messages
is resilient_warnings { pack 'U0n', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff),
],
'U0n: wrap: ' . px $_, '04'
for -0x0001;
is pack('U0n', $_), "\x00\x00", 'U0n: wrap: ' . px $_, '04' for 0x0000;
is resilient_warnings { pack 'U0n', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff),
],
'U0n: wrap: ' . px $_, '04'
for 0xffff;
is pack('U0n', $_), "\x00\x00", 'U0n: wrap: ' . px $_, '04' for 0x10000;
# N doesn't produce wrap messages
is pack('N', $_), "\xff\xff\xff\xff", 'N: wrap: ' . px $_, '08' for -0x0000_0001;
is pack('N', $_), "\x00\x00\x00\x00", 'N: wrap: ' . px $_, '08' for 0x0000_0000;
is pack('N', $_), "\xff\xff\xff\xff", 'N: wrap: ' . px $_, '08' for 0xffff_ffff;
is pack('N', $_), "\x00\x00\x00\x00", 'N: wrap: ' . px $_, '08' for 1 << 32;
# U0N doesn't produce wrap messages
is resilient_warnings { pack 'U0N', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
],
'U0N: wrap: ' . px $_, '08'
for -0x0000_0001;
is pack('U0N', $_), "\x00\x00\x00\x00", 'U0N: wrap: ' . px $_, '08' for 0x0000_0000;
is resilient_warnings { pack 'U0N', $_ }, [
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
match qr(Malformed UTF-8 character: \\xff\\xff\\xff\\xff),
],
'U0N: wrap: ' . px $_, '08'
for 0xffff_ffff;
is pack('U0N', $_), "\x00\x00\x00\x00", 'U0N: wrap: ' . px $_, '08' for 1 << 32;
# v == S
# V == L
# U doesn't produce wrap messages
is dies { pack 'U', $_ },
match qr(Use of code point 0xFFFFFFFFFFFFFFFF is not allowed; the permissible max is 0x7FFFFFFFFFFFFFFF),
'U: wrap: ' . px $_, '06'
for -1;
is pack('U', $_), "\x00", 'U: wrap: ' . px $_, '06' for 0x000000;
is pack('U', $_), "\x{10ffff}", 'U: wrap: ' . px $_, '06' for 0x10ffff;
is pack('U', $_), "\x{110000}", 'U: wrap: ' . px $_, '06' for 0x110000;
# C0U doesn't produce wrap messages
is dies { pack 'C0U', $_ },
match qr(Use of code point 0xFFFFFFFFFFFFFFFF is not allowed; the permissible max is 0x7FFFFFFFFFFFFFFF),
'C0U: wrap: ' . px $_, '06'
for -1;
is pack('C0U', $_), "\x00", 'C0U: wrap: ' . px $_, '06' for 0x000000;
is pack('C0U', $_), "\xf4\x8f\xbf\xbf", 'C0U: wrap: ' . px $_, '06' for 0x10ffff;
is pack('C0U', $_), "\xf4\x90\x80\x80", 'C0U: wrap: ' . px $_, '06' for 0x110000;
done_testing;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment