A perl string is a logical sequence of characters.
encoding a perl string produces a sequence of octets. decoding a sequence of octets produces a perl string.
utf8::is_utf8
tells you whether the string is a sequence of logical
characters (true) or a sequence of octets (false). You should not
use utf8
before calling this function.
Raw filehandles (like the default stdout) expect sequences of octets, and will warn if you give them strings that have their utf8 flag turned on.
use Encode;
my $string = "\x{0CA0}_\x{0CA0}"; # the look of disapproval
# utf8::is_utf8($string) is true
my $octets = encode('utf8', $string);
# utf8::is_utf8($octets) is false
my $utf8 = decode('utf8', $octets);
# utf8::is_utf8($utf8) is true
If we have raw octets in one encoding, we can turn them into a perl string with decode, then into some other sequence of octets in some other encoding:
use Encode;
my $greek = raw_greek_octets();
my $perly = decode('ISO-8859-7', $greek);
my $printme = encode('utf8', $perly);
print $printme, "\n";