Steffen Winkler [email protected]
I’ve existed since 1960.
I've been programming Perl since late 2000, first privately and then professionally.
Currently I work for SIEMENS AG in Erlangen, primarily in the area of web programming.
I have been attending the German Perlworkshop since 2003.
Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?
Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt as well as at Erlangen-PM.
There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).
What are the differences?
What are the limitations?
From source to multilingual application in 2 ways.
No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.
print 'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;
PO is an abbreviation for "portable object".
GNU gettext PO files can be used to make programs multilingual.
Along with the original text and its translation the file contains various comments and flags.
MO files are the binary version of PO files.
Here we use the basic module Locale::Maketext together with a module which reads gettext PO/MO files. It is called Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".
[_n] where n = 1, 2, ...
is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters. They are separated by ",". quant
, abbreviated *
, is the name of the function for plural processing.
print loc('You can log out here.');
print loc(
'He lives in [_1], [_2].',
$town,
$address,
);
print loc(
'[quant,_1,person lives,people live] here.',
$people,
);
I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires omitting "_1" in the plural forms and also omitting the following space.
print loc(
'[myplural,_1,It is _1 book,These are _1 books].',
# ???????? ^^^^^ ??? ^^^^^^^^^ ???
$books,
);
print loc(
'He has [quant,_1,house,houses] in [_2], [_3].',
$houses,
$town,
$address,
);
print loc(
'[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
$books,
$shelves,
);
Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.
x for a placeholder,
n for plural and
p for context.
The order of parameters, when present:
Context,
singular,
plural,
number for plural selection and
finally a hash with placeholder data.
Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to alphabetical order then __x, __nx, __px und __npx are the possibilities left.
__('msgid')
__x(
'msgid',
name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
'msgid', 'msgid_plural', $count,
name1 => $value1, name2 => $value2, ...
)
__xn(
'msgid', 'msgid_plural',
$count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
'context', 'msgid',
name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
'context', 'msgid', 'msgid_plural', $count,
name1 => $value1, name2 => $value2, ...
)
print __('You can log out here.');
print __x(
'He lives in {town}, {address}.',
town => $town,
address => $address,
);
print __nx(
'{num} person lives here.',
'{num} people live here.',
$people,
num => $people,
);
print __nx(
'It is {num} book.',
'These are {num} books.',
$books,
num => $books,
);
print __nx(
'He has {num} house in {town}, {address}.',
'He has {num} houses in {town}, {address}.',
$houses,
num => $houses,
town => $town,
address => $address,
);
print
__nx(
'{num} book is',
'{num} books are',
$books,
num => $books,
),
__nx(
' in {num} shelf.',
' in {num} shelves.',
$shelves,
num => $shelves,
);
Locale::Maketext has numbered parameters. If there are many, this may be confusing. All the translator can tell is that something is being included, but not what.
[_1] is a [_2] in [_3].
Locale::Maketext can handle multiple plural forms in a text phrase.
[quant,_1,book is,books are] in [*,_2,shelf,shelves].
The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.
Within this "or" block, placeholders such as _1 are no longer present. Thus it is impossible to represent plural forms which start before the number.
[myplural,_1,It is _1 book,These are _1 books].
Of course this "myplural" function does not exist.
***
Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.
{name} is a {locality} in {country}.
A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.
Locale Maketext:
singular
singular + plural
singular + plural + zero
Locale::Textdomain:
2 in the source language
arbitrarily many in the target language
The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.
German/English:
"Plural-Forms: nplurals=2; plural=n != 1\n";
Russian:
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
An example from the Russian language:
0 books -> книг (Plural 2)
1 book -> книга (Singular)
2 .. 4 books -> книги (Plural 1)
5 .. 20 books -> книг (Plural 2)
21 books -> книга (Singular)
22 .. 30 books -> книг (Plural 2)
...
100 books -> книг (Plural 2)
101 books -> книга (Singular)
102 .. 104 books -> книги (Plural 1)
105 .. 120 books -> книг (Plural 2)
121 book -> книга (Singular)
122 .. 124 books -> книги (Plural 1)
125 .. 130 books -> книг (Plural 2)
...
There are also 3 plural forms in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. There are 4 plural forms in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.
Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, i.e. singular and plural, like we are familiar with in German and English. There is a function "quant" which essentially corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. It is quite possible to imagine functions "quant3" to "quant6" for Locale::Maketext. But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.
The positions of individual words can differ in different languages e.g. in one language it is
I have 2 books.
and in another
2 books I have.
If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.
If you want to avoid the conflict, you always write entire sentences.
But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.
Yet what's needed is:
[myplural,_1,It is _1 book.,These are _1 books.]
But that would make it nothing else than Locale::TextDomain.
Due to the use of commas as separators, no commas may exist in enumerating texts.
Is there any simple quoting mechanism such as in Text::CSV? I know of none.
I need 1 book, computer or notebook to do this.
Here's a dirty workaround using ";".
I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.
Due to string concatenation using spaces, line breaks may occur between value and unit.
Depending on line length you get
I have
1 book.
or
I have 1
book.
With Locale::TextDomain you can write:
I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.
Locale::Maketext has the space hardcoded.
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."
msgid "You can log out here."
msgstr "Sie können sich hier abmelden."
msgid "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."
msgid "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."
# a bad workaround (no singular before placeholder)
msgid "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."
msgid "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."
msgid "You can log out here."
msgstr "Sie können sich hier abmelden."
msgid "He lives in {town}, {address}."
msgstr "Er wohnt in {town}, {address}."
msgid "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0] "{num} Mensch wohnt hier."
msgstr[1] "{num} Menschen wohnen hier."
msgid "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0] "Es ist {num} Buch."
msgstr[1] "Es sind {num} Bücher."
msgid "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0] "Er hat {num} Haus in {town}, {address}."
msgstr[1] "Er hat {num} Häuser in {town}, {address}."
msgid "{num} book is"
msgid_plural "{num} books are"
msgstr[0] "{num} Buch ist"
msgstr[1] "{num} Bücher sind"
msgid " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0] " in {num} Regal."
msgstr[1] " in {num} Regalen."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."
msgid "You can log out here."
msgstr "Выход из �и�темы."
# The town name should be inflected here:
# Мо�ква -> в Мо�кве
# Киев -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid "He lives in %1, %2."
msgstr "Он живет в %1, %2"
# This is not correctly translatable.
# The plural form for number 2 to 4 (человека живут) is not storable.
msgid "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,человек живет,человек живут) зде�ь."
# This is not correctly translatable.
# The plural form for number 2 to 4 (дома) is not storable.
msgid "He has %quant(%1,house,houses) in %2, %3."
msgstr "У него %quant(%1,дом,домов) в %2, %3."
# This is not correctly translatable.
# The plural form for number 2 to 4 (книги) is not storable.
msgid "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."
msgid "You can log out here."
msgstr "Выход из �и�темы."
# The town name should be inflected here:
# Мо�ква -> в Мо�кве
# Киев -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid "He lives in {town}, {address}."
msgstr "Он живет в {town}, {address}."
msgid "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0] "{num} человек живет зде�ь."
msgstr[1] "{num} человека живут зде�ь."
msgstr[2] "{num} человек живут зде�ь."
msgid "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0] "Ðто {num} книга."
msgstr[1] "Ðто {num} книги."
msgstr[2] "Ðто {num} книг."
msgid "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0] "У него {num} дом в {town}, {address}."
msgstr[1] "У него {num} дома в {town}, {address}."
msgstr[2] "У него {num} домов в {town}, {address}."
# Translate this phrase together with the next one.
msgid "{num} book is"
msgid_plural "{num} books are"
msgstr[0] "{num} книга"
msgstr[1] "{num} книги"
msgstr[2] "{num} книг"
# Translate this phrase together with the previous one.
msgid " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0] " на {num} полке."
msgstr[1] " на {num} полках."
msgstr[2] " на {num} полках."
Berlin -> Берлин
in Berlin -> в Берлине
If you want this, you need to also translate placeholder values and only then insert them.
That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.
You can only tinker here.
Inflection of nouns:
masculine singular -> Arzt
feminine singular -> Ärztin
masculine plural -> Ärzte
feminine plural -> Ärztinnen
Inflection of verbs:
Mascha ist zur Schule gegangen. -> Маша пошла в школу.
Petja ist zur Schule gegangen. -> Пет� пошёл в школу.
msgid "design"
msgstr "Design"
msgctxt "automobile"
msgid "design"
msgstr "Konstruktion"
msgctxt "verb"
msgid "design"
msgstr "zeichnen"
He writes:
Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001
[repeat, translated]
It is many years later now yet a jack of all trades still does not exist.
In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.
For the dynamism in software localization caused by plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.
In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This seemed like it would be part of a translation agency’s services.
Recommendation: Have a translation done with a smaller test file. The test should contain all the typical constructs. Repeat per-language, because subcontractors may be involved.
GNU gettext
wikipedia http://en.wikipedia.org/wiki/Gettext
gettext homepage http://www.gnu.org/software/gettext/gettext.html
Singular, Plural, Dual, Trial, Quadral
wikipedia - dual http://en.wikipedia.org/wiki/Dual_%28grammatical_number%29
wikipedia - all forms http://en.wikipedia.org/wiki/Sursurunga_language
sourceforge - which language - which plural form http://translate.sourceforge.net/wiki/l10n/pluralforms
CPAN module Locale::Maketext
CPAN module Locale::Maketext::Simple
obsolete article by Sean M. Burke about software localization
CPAN module Locale::TextDomain
Thanks for the support, the many ideas, examples and corrections.
Nikolai Prokoschenko http://rassie.org/
Nikolai Prokoschenko - On the state of i18n in Perl http://rassie.org/archives/247
I was going to say it is a problem with Github’s POD formatter, since clicking the “raw” link would reveal the source to be fine. But then I remembered the
=encoding utf8
directive of POD and realised neither file had them. I added them and now the documents render fine. Thanks for the impetus to figure it out!