I'm going to ignore the j
, J
, f
, F
, d
, D
, p
, P
, u
, w
formats.
My machine has the x86_64
architecture (little-endian) and I'm running perl-5.38-2
.
The way it's described is not necessarily matches the way it works internally. This is merely an explanation that seems to work.
$ perl -V:shortsize -V:intsize -V:longsize -V:longlongsize
shortsize='2';
intsize='4';
longsize='8';
longlongsize='8';
And a couple of words on storing strings in Perl. Basically there are 2 types of strings in Perl: binary strings (the UTF8 flag is unset) and UTF-8 strings (the UTF8 flag is set). Let me introduce a notation to describe strings here: u
if the UTF8 flag is set, the character length in square brackets, optionally followed by a colon, and the bytes separated by dots. E.g. [1]:ff
is a binary string, one character long, which is \xff
. u[1]:c2.80
is a UTF-8 string, one character long, with bytes 0xc2
and 0x80
(U+0080
). The following function converts a string to such notation:
sub {
my $v = shift;
my $u = utf8::is_utf8($v) ? 'u' : '';
my $l = length $v;
use bytes;
sprintf("%s[%u]", $u, $l)
. (length $v ? sprintf ":%v02x", $v : '');
}
Binary strings are sequences of bytes. And although they can contain a UTF-8 encoded representation of a string ("\xc2\x80"
, which is [2]:c2.80
, which bytes represent U+0080
), that is not necessarily so ("\xff"
, which is [1]:ff
, which bytes are invalid UTF-8), each character is < 0x100
and corresponds to one byte.
UTF-8 strings are sequences of Unicode code points (or characters that correspond to Unicode code points). And although they can contain arbitrary data (use Encode qw(_utf8_on); my $s = "\x80"; _utf8_on($s)
, which is u[1]:80
, which bytes are invalid UTF-8), that is not necessarily so ("\x{100}"
, which is u[1]:c4.80
, which bytes represent U+0100
), characters can be > 0xff
and correspond to more than 1 byte.
Functions encode()
and decode()
transform UTF-8 strings into binary ones (encode 'UTF-8', "\x{100}"
, which is [2]:c4.80
, which is "\xc4\x80"
), and back again (decode 'UTF-8', "\xc4\x80"
, which is u[1]:c4.80
, which is "\x{100}"
).
Some characters can be stored either as a binary ("\x80"
, which is [1]:80
) or as a UTF-8 string ("\N{U+0080}"
, which is u[1]:c2.80
). "\x80"
in this case can be upgraded to its UTF-8 counterpart (my $s = "\x80"; utf8::upgrade($s);
, which is u[1]:c2.80
, which is "\N{U+0080}"
), and downgraded back again (my $s = "\N{U+0080}"; utf8::downgrade($s)
, which is [1]:80
, which is "\x80"
).
As long as Perl can store a string as a binary one, it will do so ("\xff"
, which is [1]:ff
). But code points > 0xff
can't be stored that way ("\x{100}"
, which is u[1]:c4.80
, which represents U+0100
).
Also if Perl concatenates a binary and a UTF-8 string (or vice versa), the binary string is upgraded first ("\x80" . "\x{100}" eq "\N{U+0080}\x{100}"
, which is u[2]:c2.80.c4.80
, which bytes represent U+0080
, U+0100
).
If you want to interact with C code or network services, you generally want binary strings. Although you can store binary data in UTF-8 strings... At least generally it's a bad idea.
With that out of the way...
tl;dr
a
a character of a string (null-padding) (pack 'a2', 'a'
->
"a\x00"
)A
a character of a string (space-padding) (pack 'A2'
->
'a '
)Z
a character of a string (null-padding, null-termination) (pack 'Z3', 'a'
->
"a\x00\x00"
)b
a binary digit of a string (LSB first) (pack 'b', '1'
->
"\x01"
)B
a binary digit of a string (MSB first) (pack 'B', '1'
->
"\x80"
)h
a hex digits of a string (low nybble first) (pack 'h', '1'
->
"\x01"
)H
a hex digits of a string (high nybble first) (pack 'H', '1'
->
"\x10"
)c
a signed char (8-bit) (pack 'c', -1
->
"\xff"
)C
an unsigned char (8-bit) (pack 'C', 1
->
"\x01"
)W
a Unicode code point (pack 'W', 0x100
->
"\x{100}"
)s
a signed short (16-bit) (pack 's', -1
->
"\xff\xff"
)S
an unsigned short (16-bit) (pack 'S', 1
->
"\x01\x00"
in the case of little-endian byte order)l
a signed long (32-bit) (pack 'l', -1
->
"\xff\xff\xff\xff"
)L
an unsigned long (32-bit) (pack 'L', 1
->
"\x01\x00\x00\x00"
in the case of little-endian byte order)q
a signed quad (64-bit) (pack 'q', -1
->
"\xff\xff\xff\xff" . "\xff\xff\xff\xff"
)Q
an unsigned quad (64-bit) (pack 'Q', 1
->
"\x01\x00\x00\x00" . "\x00\x00\x00\x00"
in the case of little-endian byte order)i
a signed int (native) (pack 'i', -1
->
"\xff\xff\xff\xff"
in the case ofintsize == 4
)I
an unsigned int (native) (pack 'I', 1
->
"\x01\x00\x00\x00"
in the case of little-endian byte order andintsize == 4
)n
an unsigned short (16-bit, big-endian) (pack 'n', 1
->
"\x00\x01"
)N
an unsigned long (32-bit, big-endian) (pack 'N', 1
->
"\x00\x00\x00\x01"
)v
an unsigned short (16-bit, little-endian) (pack 'v', 1
->
"\x01\x00"
)V
an unsigned long (32-bit, little-endian) (pack 'V', 1
->
"\x01\x00\x00\x00"
)U
UTF-8 encoded representation of a Unicode code point (pack 'U', 0x80
->
"\N{U+0080}"
)x
packs a null, or skips characters when unpacking (unpack 'xa', 'ab'
->
'b'
)X
truncates back, or steps back when unpacking (pack 'aX', 'a'
->
''
)@
truncates/null-fills to an absolute position given by a repeat count, or moves to an absolute position when unpacking (pack '@1'
->
"\x00"
).
truncates/null-fills to an absolute position given by an argument, or unpacks nulls (pack '.', 1
->
"\x00"
)
pack TEMPLATE, LIST
takes LIST
and packs it into a string according to TEMPLATE
, which is a sequence of formats, e.g. pack 'ac', 'a', 1
returns "a\x01"
. There are two formats here: a
and c
. Each format tells pack()
what to do with the next argument: a
to take one character from the argument ('a'
), and put it into the resulting string, c
to take a 8-bit signed integer i
(1
), and add chr(i)
to the string.
Formats differ in the values they take:
a
,A
,Z
,b
,B
,h
,H
(string formats) take characters from a string (pack 'a', 'a'
)c
,C
,W
,s
,S
,l
,L
,q
,Q
,i
,I
,n
,N
,v
,V
,U
(integer formats) take integers (pack 'c', 1
)x
,X
,@
take nothing.
takes an integer (pack '.', 1
), but it's rather similar to the previous group in what it does
Formats can be followed by a repeat count, which tells pack()
how many values a format takes:
- string formats take values from one argument (
pack 'a2', 'ab'
returns'ab'
), but each next format takes values from the next argument (pack 'aa', 'a', 'b'
returns'ab'
) - for integer formats each argument is a value (
pack 'c2', 1, 2
equalspack 'cc', 1, 2
equals"\x01\x02"
)
Some even say, that repeat count is length (string length) in the case of string formats.
unpack TEMPLATE, EXPR
is a reverse operation (unpack 'ac', "a\x01"
returns 'a', 1
), although unpack $template, pack $template, @list
doesn't always equal @list
, e.g. unpack 'a2', pack 'a2', 'a'
returns "a\x00"
, because a
packs nulls if it runs out of characters (pack 'a2', 'a'
returns "a\x00"
).
Then, a
, A
, Z
pack/unpack strings. They take a character from the argument and copy it into the resulting string:
a
,Z
pad values (if there are not enough characters) with nulls (pack 'a2', 'a'
->
"a\x00"
),A
with spaces (pack 'A2', 'a'
->
'a '
)Z
is likea
, but takes one character less and adds\x00
(pack 'Z2', 'a'
->
"a\x00"
)A
strips trailing spaces and nulls when unpacking (unpack 'A3', "a\x00 "
->
'a'
),Z
strips the first null and what follows (unpack 'Z3', "a\x00b"
->
'a'
),a
strips nothing- all three unpack separate formats into separate values (
unpack 'aa', 'ab'
->
'a', 'b'
)
b
, B
, h
, H
pack/unpack digits. They take a digit from the argument and put it into a byte of the resulting string:
- in the case of
b
,B
a digit is0..1
, forh
,H
it's0..f
b
,B
fill the resulting string with bits,h
,H
with nybblesb
,h
start with the LSB/low nybble (pack 'b', '1'
->
"\x01"
),B
,H
with the MSB/high nybble (pack 'B', '1'
->
"\x80"
)- in the case of
b
,B
every 8 digits produce a character (pack 'b9', '1111' . '1111' . '1'
->
"\xff\x01"
) - in the case of
h
,H
every 2 digits produce a character (pack 'h3', '123'
->
"\x21\x03"
) - all four unpack separate formats into separate values (
unpack 'bb', "\x01\x01
->
'1', '1'
)
c
, C
, s
, S
, l
, L
, q
, Q
, i
, I
, n
, N
, v
, V
pack/unpack integers. They take an integer, split it into bytes bi
, and put chr(bi)
into the resulting string:
c
takes an integer in the range-128..127
,C
in the range0..255
, each argument produces one character in the resulting string (pack 'c', 1
->
"\x01"
)s
takes an integer in the range-0x8000..0x7fff
,S
in the range0..0xffff
, each argument produces 2 characters in the system's native byte order (pack 's', 1
produces"\x01\x00"
in the case of a little-endian system)l
takes an integer in the range-0x8000_0000..0x7fff_ffff
,L
in the range0..0xffff_ffff
, each argument produces 4 characters in the system's native byte order (pack 'l', 1
produces"\x01\x00\x00\x00"
in the case of a little-endian system)q
takes an integer in the range-0x8000_0000_0000_0000..0x7fff_ffff_ffff_ffff
,Q
in the range0..0xffff_ffff_ffff_ffff
, each argument produces 8 characters in the system's native byte order (pack 'q', 1
produces"\x01\x00\x00\x00" . "\x00\x00\x00\x00"
in the case of a little-endian system)- for
i
andI
the range and the number of produced characters is system-dependent, in my case they're-0x8000_0000..0x7fff_ffff
/4
(l
) and0..0xffff_ffff
/4
(L
) respectively n
takes an integer in the range0..0xffff
and produces 2 characters in the big-endian byte order (pack 'n', 1
->
"\x00\x01"
)N
takes an integer in the range0..0xffff_ffff
and produces 4 characters in the big-endian byte order (pack 'N', 1
->
"\x00\x00\x00\x01"
)v
takes an integer in the range0..0xffff
and produces 2 characters in the little-endian byte order (seeS
in the case of a little-endian system)V
takes an integer in the range0..0xffff_ffff
and produces 4 characters in the little-endian byte order (seeL
in the case of a little-endian system)
So far I assumed that what is fed to pack()
/unpack()
is binary strings, and resulting string is binary as well. At this point I can no longer ignore the issue.
W
packs/unpacks code points:
- it takes an integer in the range
0..0x10ffff
, each argument is converted to a character (chr()
) and added to the resulting string (pack 'W', 1
->
"\x01"
, which is[1]:01
) - when the value is
> 0xff
, it produces a UTF-8 string (pack 'W', 0x100
->
"\x{100}"
, which isu[1]:c4.80
)
By default pack()
operates in C0
(character) mode. In this mode the resulting string is considered a sequence of characters:
-
a
,A
,Z
produce characters and they're packed as characters:- binary characters are added as binary characters (
pack 'a', "\xff"
->
"\xff"
, which is[1]:ff
) - UTF-8 characters are added as UTF-8 characters (
pack 'a' "\x{100}"
->
"\x{100}"
, which isu[1]:c4.80
)
- binary characters are added as binary characters (
-
b
,B
,h
,H
produce bytes, they're converted to characters (chr()
) and added to the string (pack 'b', '1'
->
"\x01"
, which is[1]:01
). -
c
,C
,s
,S
,l
,L
,q
,Q
,i
,I
,n
,N
,v
,V
,U
produce integers: -
W
produces code points, they are converted to characters (chr()
), and added to the string:- when the code point is
< 0x100
, a binary character is produced (pack 'W', 0xff
->
"\xff"
, which is[1]:ff
) - otherwise , a UTF-8 character is produced (
pack 'W', 0x100
->
"\x{100}"
, which isu[1]:c4.80
)
- when the code point is
In other words what is added to the resulting string is characters. Formats like W
, b
, c
produce integers, but they're converted to characters with chr()
. Formats like s
and bigger consider each byte separately.
Adding UTF-8 characters to the string affects other values:
-
If the resulting string is binary, and a UTF-8 character is added, the resulting string is upgraded first. Let's consider
pack 'aa', "\xff", "\x{100}"
. First"\xff"
is added to the result, which becomes[1]:ff
. Then the resulting string is upgraded, because"\x{100}"
is a UTF-8 character, and becomesu[1]:c3.bf
(U+00FF
). Then"\x{100}"
is added, producingu[2]:c3.bf.c4.80
(U+00FF
,U+0100
). -
If the resulting string is UTF-8, and a binary character is added, it's first upgraded. Let's consider
pack 'aa', "\x{100}", "\xff"
. First"\x{100}"
is added to the result, which becomesu[1]:c4.80
(U+0100
). Then"\xff"
is upgraded and becomesu[1]:c3.bf
(U+00FF
). Then the upgraded character is added to the string, producingu[2]:c4.80.c3.bf
(U+0100
,U+00FF
).
In other words:
-
pack 'Ca', 0x80, "\N{U+0080}"
producesu[2]:c2.80.c2.80
, that is initiallyC
produces"\x80"
, but then it's upgraded to"\N{U+0080}"
(u[1]:c2.80
). -
pack 'aC', "\N{U+0080}", 0x80
producesu[2]:c2.80.c2.80
, that isC
might have produced"\x80"
, but since the resulting string is UTF-8, it produces"\N{U+0080}"
(u[1]:c2.80
).
what if UTF-8 character is indeed a binary character with the UTF8 flag set?
The mode can be switched to U0
(UTF-8 byte) mode with U0
. In this mode the resulting string is considered a sequence of bytes (a UTF-8 encoded representation of a string), and invalid UTF-8 is not accepted. E.g. pack('U0C', 0x80)
produces a warning Malformed UTF-8 character: \x80
and dies. Also switching to U0
(at least once) forces the result to be a UTF-8 string (pack 'U0'
->
u[0]
):
a
,A
,Z
produce characters, but what is added to the string is their code points, wrapped at0xff
:pack 'U0a', "\x01"
packs0x01
into the string, producingu[1]:01
,pack 'a', "\x01"
would produce[1]:01
pack 'U0a', "\N{U+00C0}"
("\xc3\x80"
in UTF-8) produces a warningMalformed UTF-8 character: \xc0
and dies, because thea
format has packed0xc0
(not0xc3
,0x80
) into the string,pack 'a', "\N{U+00C0}"
would produceu[1]:c3.80
pack 'U0a', "\x{100}"
(0xc4
,0x80
in UTF-8) produces a warningCharacter(s) in 'a' format wrapped
, and packs0x00
into the string, producingu[1]:00
,pack 'a', "\x{100}"
would produceu[1]:c4.80
b
,B
,h
,H
produce bytes, and these bytes are added to the string:pack 'U0b', '1'
packs0x01
into the string, producingu[1]:01
,pack 'b', '1'
would produce[1]:01
pack 'U0B', '1'
produces a warningMalformed UTF-8 character: \x80
and dies, because theB
format has packed0x80
into the string,pack 'B', '1'
would produce[1]:80
c
,C
,s
,S
,l
,L
,q
,Q
,i
,I
,n
,N
,v
,V
,U
produce bytes (some formats produce more than 1 byte), and these bytes are added to the string:pack 'U0c', 1
packs0x01
into the string, producingu[1]:01
,pack 'c', 1
would produce[1]:01
pack 'U0C', 0x80
produces a warningMalformed UTF-8 character: \x80
and dies, because theC
format has packed0x80
into the string,pack 'C', 0x80
would produce[1]:80
pack 'U0S', 1
packs0x01
and0x00
(on a little-endian system) into the string, producingu[2]:01.00
,pack 'S', 1
would produce[2]:01.00
W
produces a code point, the code point is wrapped at0xff
, and the resulting byte is added to the string:pack 'U0W', 1
packs0x01
into the string, producingu[1]:01
,pack 'W', 1
would produce[1]:01
pack 'U0W', 0x80
produces a warningMalformed UTF-8 character: \x80
and dies, because theW
format has packed0x80
into the string,pack 'W', 0x80
would produce[1]:80
pack 'U0W', 0x100
produces a warningCharacter in 'W' format wrapped
and packs0x00
into the string, producingu[1]:00
,pack 'W', 0x100
would produceu[1]:c4.80
(U+0100
)
In other words what is added to the resulting string is bytes. Formats like s
produce more than one byte. W
produces code points, but they're wrapped at 0xff
and become bytes. Formats like a
produce characters, but what is added to the string is their code points wrapped at 0xff
.
Do note that what matters is for the resulting byte sequence (after finishing processing the template) to be valid UTF-8:
pack 'U0C', 1
succeeds because the resulting byte sequence (0x01
) is valid UTF-8pack 'U0C', 0xc2
fails because the resulting byte sequence (0xc2
) is invalid UTF-8pack 'U0CC', 0xc2, 0x80
succeeds because the resulting byte sequence (0xc2
,0x80
) is valid UTF-8
pack()
produces a UTF-8 string if:
U0
was enabled at least once (pack 'U0'
,pack 'U', 0
) TODO move belowW
was passed a code point> 0xff
(pack 'W', 0x100
)a
,A
,Z
was passed a UTF-8 string (pack 'a', "\N{U+0080}"
, ormy $s = ''; utf8::upgrade($s); pack 'a', $s
)
The mode can be switched back to C0
with C0
. E.g. pack 'U0C0a', "\x80"
->
u[1]:c2.80
(U+0080
), pack 'U0a', "\x80"
would die. "\x80"
is upgraded before adding to the string because U0
mode made the string UTF-8.
U
packs/unpacks code points:
- In
U0
mode theU
format produces the UTF-8 bytes of the code point, which are packed into the resulting string. E.g.pack 'U0U', 0x80
->
u[1]:c2.80
(U+0080
). - In
C0
mode theU
format still produces the UTF-8 bytes, but they are packed as separate characters (pack 'C0U', 0x80
->
[2]:c2.80
).
The mode is switched to U0
implicitly if TEMPLATE
starts with the U
format. E.g. pack 'Ua2', 0, "\xc2\x80"
->
u[2]:00.c2.80
(U+0000
, U+0080
). pack 'a2', "\xc2\x80"
would produce [2]:c2.80
.
By default pack()
operates in C0
(character) mode. In this mode values are added to the resulting string as characters:
pack('a', 'a') == 'a'
pack('A', 'a') == 'a'
pack('Z2', 'a') == "a\x00"
pack('b8', '11111111') == "\xff"
(a value is added as soon as there's a byte)pack('B8', '11111111') == "\xff"
pack('h2', '11') == "\x11"
pack('H2', '11') == "\x11"
pack('c', 1) == "\x01"
pack('C', 1) == "\x01"
pack('W', 1) == "\x01"
pack('s', 1) == "\x01\x00"
(each byte is added as a separate character)pack('S', 1) == "\x01\x00"
pack('l', 1) == "\x01\x00\x00\x00"
pack('L', 1) == "\x01\x00\x00\x00"
pack('q', 1) == "\x01\x00\x00\x00\x00\x00\x00\x00"
pack('Q', 1) == "\x01\x00\x00\x00\x00\x00\x00\x00"
pack('i', 1) == "\x01\x00\x00\x00"
(i == l
in my case)pack('I', 1) == "\x01\x00\x00\x00"
(I == L
in my case)pack('n', 1) == "\x00\x01"
pack('N', 1) == "\x00\x00\x00\x01"
pack('v', 1) == "\x01\x00"
(v == S
in the case of a little-endian system)pack('V', 1) == "\x01\x00\x00\x00"
(V == L
in the case of a little-endian system)
In U0
(UTF-8 byte) mode, which is turned on with, well, U0
(pack('U0...', ...)
), values are added to a sequence of bytes. Before returning from pack()
the sequence of bytes is typecasted or becomes the resulting string (or so it looks). As such the resulting sequence of bytes should be valid UTF-8. Also in this mode the ranges of a
, A
, Z
and W
reduced to 0..0xff
:
pack('U0a2', "\xdf\xbf") == "\x{7ff}"
("\xdf\xbf"
is the UTF-8 encoded representation of"\x{7ff}"
)pack('U0A2', "\xdf\xbf") == "\x{7ff}"
pack('U0Z3', "\xdf\xbf") == "\x{7ff}\x00"
pack('U0b16', '1111' . '1011' . '1111' . '1101') == "\x{7ff}"
(whatb
takes in hex is'fbfd'
)pack('U0B16', '1101' . '1111' . '1011' . '1111') == "\x{7ff}"
(wahtB
takes in hex is'dfbf'
)pack('U0h4', 'fdfb') == "\x{7ff}"
pack('U0H4', 'dfbf') == "\x{7ff}"
pack('U0c2', 0xdf - 0x100, 0xbf - 0x100) == "\x{7ff}"
(thec
range is-0x80..0x7f
, so we need to adjust the values)pack('U0C2', 0xdf, 0xbf) == "\x{7ff}"
pack('U0W2', 0xdf, 0xbf) == "\x{7ff}"
pack('U0s', 0xbfdf - 0x10000) == "\x{7ff}"
pack('U0S', 0xbfdf) == "\x{7ff}"
pack('U0l', 0xbfdf) == "\x{7ff}\x00\x00"
pack('U0L', 0xbfdf) == "\x{7ff}\x00\x00"
pack('U0q', 0xbfdf) == "\x{7ff}\x00\x00\x00\x00\x00\x00"
pack('U0Q', 0xbfdf) == "\x{7ff}\x00\x00\x00\x00\x00\x00"
pack('U0i', 0xbfdf) == "\x{7ff}\x00\x00"
(i == l
in my case)pack('U0I', 0xbfdf) == "\x{7ff}\x00\x00"
(I == L
in my case)pack('U0n', 0xdfbf) == "\x{7ff}"
pack('U0N', 0xdfbf) == "\x00\x00\x{7ff}"
pack('U0v', 0xbfdf) == "\x{7ff}"
(v == S
in the case of a little-endian system)pack('U0V', 0xbfdf) == "\x{7ff}\x00\x00"
(V == L
in the case of a little-endian system)pack('U0U', 0x7ff) == "\x{7ff}"
(U
adds to the sequence of bytes the UTF-8 encoded representation of its argument)
Do note that the sequence of bytes doesn't have to be valid UTF-8 at any intermediate step (pack('U0aXac', "\x80", "\xdf", 0xbf - 0x100) == "\x{7ff}"
, X
erases the last byte).
In addition to turning on the U0
mode explicitly, it's turned on implicitly when TEMPLATE
starts with U
(pack('Ua2', 0x7ff, "\xdf\xbf") == "\x{7ff}\x{7ff}"
). In C0
mode U
produces UTF-8 encoded representation of its argument (pack('C0U', 0x7ff) == "\xdf\xbf"
, pack('aU', "\x80", 0x7ff) == "\x80\xdf\xbf"
). You can always switch the mode midway explicitly (pack('...U0...C0...', ...)
).
Or in other words, generally W
and U
take a code point and produce a character (pack('W', 1) == "\x01"
, pack('U', 1) == "\x01"
). But in U0
mode W
takes UTF-8 representation and produces a character (pack('U0W2', 0xdf, 0xbf) == "\x{7ff}"
), and in C0
mode U
takes a code point and produces UTF-8 representation (pack('C0U', 0x7ff) == "\xdf\xbf"
).
x
produces a null (pack('x') == "\x00"
).
X
takes a step back, removing the characters in the process (pack('aX', 'a') == ''
).
@
moves the current position in the resulting string, truncating or null-filling it in the process (pack('a@0', 'a') == ''
, pack('@1') == "\x00"
). The repeat count is an absolute position counted from the beginning of the resulting string.
.
is like @
that takes its argument not from a repeat count (pack('a.', 'a', 0) == ''
, pack('.', 1) == "\x00"
).
Formats might be grouped with parenthesis. In this case the @
/.
's arguments are counted from the start of the innermost group (pack('a(a@0)', 'a', 'a') == 'a'
).
$ docker run --rm -itv "$PWD":/host alpine:3.20
/ # apk add perl perl-test2-suite perl-utils
/ # for f in host/.*.pl host/*.pl; prove "$f" || break; done
perlpacktut
pack
unpack
Pack/Unpack Tutorial
What follows is my experiments: