ap/I18N_STEFFENW.en.pod

Last active January 13, 2023 11:35

Star (7) You must be signed in to star a gist
Fork (2) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ap/909197.js"></script>
Save ap/909197 to your computer and use it in GitHub Desktop.

Download ZIP

Selecting an Internationalization Framework (GPW10)

Raw

I18N_STEFFENW.en.pod

Selecting an Internationalization Framework

Author

Steffen Winkler [email protected]

Bio

Iâ€™ve existed since 1960.

I've been programming Perl since late 2000, first privately and then professionally.

Currently I work for SIEMENS AG in Erlangen, primarily in the area of web programming.

I have been attending the German Perlworkshop since 2003.

Abstract

Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?

Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt as well as at Erlangen-PM.

There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).

What are the differences?

What are the limitations?

What I want to talk about today

From source to multilingual application in 2 ways.

No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.

It begins with the application's source code

print  'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;

PO files - what are they?

PO is an abbreviation for "portable object".

GNU gettext PO files can be used to make programs multilingual.

Along with the original text and its translation the file contains various comments and flags.

MO files are the binary version of PO files.

Rewriting to Locale::Maketext::Simple

Here we use the basic module Locale::Maketext together with a module which reads gettext PO/MO files. It is called Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".

[_n] where n = 1, 2, ...

is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters. They are separated by ",". quant, abbreviated *, is the name of the function for plural processing.

print loc('You can log out here.');

print loc(
    'He lives in [_1], [_2].',
    $town,
    $address,
);

print loc(
    '[quant,_1,person lives,people live] here.',
    $people,
);

I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires omitting "_1" in the plural forms and also omitting the following space.

print loc(  
    '[myplural,_1,It is _1 book,These are _1 books].',
    # ????????    ^^^^^ ???     ^^^^^^^^^ ???
    $books, 
);

print loc(
    'He has [quant,_1,house,houses] in [_2], [_3].',
    $houses,
    $town,
    $address,
);

print loc(
    '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
    $books,
    $shelves,
);

Rewriting to Locale::TextDomain

Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.

x for a placeholder,
n for plural and
p for context.

The order of parameters, when present:

Context,
singular,
plural,
number for plural selection and
finally a hash with placeholder data.

Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to alphabetical order then __x, __nx, __px und __npx are the possibilities left.

__('msgid')
__x(
    'msgid',
    name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
    'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)
__xn(
    'msgid', 'msgid_plural',
    $count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
    'context', 'msgid',
    name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
    'context', 'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)

print __('You can log out here.');

print __x(
    'He lives in {town}, {address}.',
    town    => $town,
    address => $address,
);

print __nx(
    '{num} person lives here.',
    '{num} people live here.',
    $people,
    num => $people,
);


print __nx(
    'It is {num} book.',
    'These are {num} books.',
    $books,
    num => $books,
);

print __nx(
    'He has {num} house in {town}, {address}.',
    'He has {num} houses in {town}, {address}.',
    $houses,
    num     => $houses,
    town    => $town,
    address => $address,
);

print
    __nx(
        '{num} book is',
        '{num} books are',
        $books,
        num => $books,
    ),
    __nx(
        ' in {num} shelf.',
        ' in {num} shelves.',
        $shelves,
        num => $shelves,
    );

What do you see at first glance?

Locale::Maketext has numbered parameters. If there are many, this may be confusing. All the translator can tell is that something is being included, but not what.

[_1] is a [_2] in [_3].

Locale::Maketext can handle multiple plural forms in a text phrase.

[quant,_1,book is,books are] in [*,_2,shelf,shelves].

The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.

Within this "or" block, placeholders such as _1 are no longer present. Thus it is impossible to represent plural forms which start before the number.

[myplural,_1,It is _1 book,These are _1 books].

Of course this "myplural" function does not exist.

***

Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.

{name} is a {locality} in {country}.

A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.

Things you won't spot immediately

Number of plural forms

Locale Maketext:

singular
singular + plural
singular + plural + zero

Locale::Textdomain:

2 in the source language
arbitrarily many in the target language

The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.

German/English:

"Plural-Forms: nplurals=2; plural=n != 1\n";

Russian:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

An example from the Russian language:

0          books -> ÐºÐ½Ð¸Ð³  (Plural 2)
1          book  -> ÐºÐ½Ð¸Ð³Ð° (Singular)
2 .. 4     books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
5 .. 20    books -> ÐºÐ½Ð¸Ð³  (Plural 2)
21         books -> ÐºÐ½Ð¸Ð³Ð° (Singular)
22 .. 30   books -> ÐºÐ½Ð¸Ð³  (Plural 2)
...
100        books -> ÐºÐ½Ð¸Ð³  (Plural 2)
101        books -> ÐºÐ½Ð¸Ð³Ð° (Singular)
102 .. 104 books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
105 .. 120 books -> ÐºÐ½Ð¸Ð³  (Plural 2)
121        book  -> ÐºÐ½Ð¸Ð³Ð° (Singular)
122 .. 124 books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
125 .. 130 books -> ÐºÐ½Ð¸Ð³  (Plural 2)
...

There are also 3 plural forms in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. There are 4 plural forms in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.

Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, i.e. singular and plural, like we are familiar with in German and English. There is a function "quant" which essentially corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. It is quite possible to imagine functions "quant3" to "quant6" for Locale::Maketext. But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.

Position of words in a sentence in different languages

The positions of individual words can differ in different languages e.g. in one language it is

I have 2 books.

and in another

2 books I have.

If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.

If you want to avoid the conflict, you always write entire sentences.

But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.

Yet what's needed is:

[myplural,_1,It is _1 book.,These are _1 books.]

But that would make it nothing else than Locale::TextDomain.

Comma in plural forms, or the "join and never can split" trap

Due to the use of commas as separators, no commas may exist in enumerating texts.

Is there any simple quoting mechanism such as in Text::CSV? I know of none.

I need 1 book, computer or notebook to do this.

Here's a dirty workaround using ";".

I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Value and unit may get wrapped

Due to string concatenation using spaces, line breaks may occur between value and unit.

Depending on line length you get

I have
1 book.

I have 1
book.

With Locale::TextDomain you can write:

I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.

Locale::Maketext has the space hardcoded.

Excerpt from a PO file for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid  "You can log out here."
msgstr "Sie können sich hier abmelden."

msgid  "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."

msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."

# a bad workaround (no singular before placeholder)
msgid  "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."

msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

extract from a PO file for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid        "You can log out here."
msgstr       "Sie können sich hier abmelden."

msgid        "He lives in {town}, {address}."
msgstr       "Er wohnt in {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Mensch wohnt hier."
msgstr[1]    "{num} Menschen wohnen hier."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Es ist {num} Buch."
msgstr[1]    "Es sind {num} Bücher."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Er hat {num} Haus in {town}, {address}."
msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} Buch ist"
msgstr[1]    "{num} Bücher sind"

msgid        " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0]    " in {num} Regal."
msgstr[1]    " in {num} Regalen."

PO file for English/Russian translation

for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid  "You can log out here."
msgstr "Ð’Ñ‹Ñ…Ð¾Ð´ Ð¸Ð· Ñ�Ð¸Ñ�Ñ‚ÐµÐ¼Ñ‹."

# The town name should be inflected here: 
# ÐœÐ¾Ñ�ÐºÐ²Ð° -> Ð² ÐœÐ¾Ñ�ÐºÐ²Ðµ
# ÐšÐ¸ÐµÐ²   -> Ð² ÐšÐ¸ÐµÐ²Ðµ
# ÐœÑ‹Ñ‚Ð¸Ñ‰Ð¸ -> Ð² ÐœÑ‹Ñ‚Ð¸Ñ‰Ð°Ñ… (nicht regulÃ¤r)
msgid  "He lives in %1, %2."
msgstr "ÐžÐ½ Ð¶Ð¸Ð²ÐµÑ‚ Ð² %1, %2"

# This is not correctly translatable.
# The plural form for number 2 to 4 (Ñ‡ÐµÐ»Ð¾Ð²ÐµÐºÐ° Ð¶Ð¸Ð²ÑƒÑ‚) is not storable.
msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Ñ‡ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÐµÑ‚,Ñ‡ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÑƒÑ‚) Ð·Ð´ÐµÑ�ÑŒ."

# This is not correctly translatable.
# The plural form for number 2 to 4 (Ð´Ð¾Ð¼Ð°) is not storable.
msgid  "He has %quant(%1,house,houses) in %2, %3."
msgstr "Ð£ Ð½ÐµÐ³Ð¾ %quant(%1,Ð´Ð¾Ð¼,Ð´Ð¾Ð¼Ð¾Ð²) Ð² %2, %3."

# This is not correctly translatable.
# The plural form for number 2 to 4 (ÐºÐ½Ð¸Ð³Ð¸) is not storable.
msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,ÐºÐ½Ð¸Ð³Ð°,ÐºÐ½Ð¸Ð³) Ð½Ð° %quant(%1,Ð¿Ð¾Ð»ÐºÐµ,Ð¿Ð¾Ð»ÐºÐ°Ñ…)."

for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid        "You can log out here."
msgstr       "Ð’Ñ‹Ñ…Ð¾Ð´ Ð¸Ð· Ñ�Ð¸Ñ�Ñ‚ÐµÐ¼Ñ‹."

# The town name should be inflected here: 
# ÐœÐ¾Ñ�ÐºÐ²Ð° -> Ð² ÐœÐ¾Ñ�ÐºÐ²Ðµ
# ÐšÐ¸ÐµÐ²   -> Ð² ÐšÐ¸ÐµÐ²Ðµ
# ÐœÑ‹Ñ‚Ð¸Ñ‰Ð¸ -> Ð² ÐœÑ‹Ñ‚Ð¸Ñ‰Ð°Ñ… (nicht regulÃ¤r)
msgid        "He lives in {town}, {address}."
msgstr       "ÐžÐ½ Ð¶Ð¸Ð²ÐµÑ‚ Ð² {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Ñ‡ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÐµÑ‚ Ð·Ð´ÐµÑ�ÑŒ."
msgstr[1]    "{num} Ñ‡ÐµÐ»Ð¾Ð²ÐµÐºÐ° Ð¶Ð¸Ð²ÑƒÑ‚ Ð·Ð´ÐµÑ�ÑŒ."
msgstr[2]    "{num} Ñ‡ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÑƒÑ‚ Ð·Ð´ÐµÑ�ÑŒ."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "ÐÑ‚Ð¾ {num} ÐºÐ½Ð¸Ð³Ð°."
msgstr[1]    "ÐÑ‚Ð¾ {num} ÐºÐ½Ð¸Ð³Ð¸."
msgstr[2]    "ÐÑ‚Ð¾ {num} ÐºÐ½Ð¸Ð³."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼ Ð² {town}, {address}."
msgstr[1]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼Ð° Ð² {town}, {address}."
msgstr[2]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼Ð¾Ð² Ð² {town}, {address}."

# Translate this phrase together with the next one.
msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} ÐºÐ½Ð¸Ð³Ð°"
msgstr[1]    "{num} ÐºÐ½Ð¸Ð³Ð¸"
msgstr[2]    "{num} ÐºÐ½Ð¸Ð³"

# Translate this phrase together with the previous one.
msgid        " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐµ."
msgstr[1]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐ°Ñ…."
msgstr[2]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐ°Ñ…."

Inflecting "in {town}"

Berlin    -> Ð‘ÐµÑ€Ð»Ð¸Ð½
in Berlin -> Ð² Ð‘ÐµÑ€Ð»Ð¸Ð½Ðµ

If you want this, you need to also translate placeholder values and only then insert them.

That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.

You can only tinker here.

neutral/masculine/feminine singular/plural

Inflection of nouns:

masculine singular -> Arzt
feminine  singular -> Ärztin
masculine plural   -> Ärzte
feminine  plural   -> Ärztinnen

Inflection of verbs:

Mascha ist zur Schule gegangen. -> ÐœÐ°ÑˆÐ° Ð¿Ð¾ÑˆÐ»Ð° Ð² ÑˆÐºÐ¾Ð»Ñƒ.
Petja ist zur Schule gegangen.  -> ÐŸÐµÑ‚Ñ� Ð¿Ð¾ÑˆÑ‘Ð» Ð² ÑˆÐºÐ¾Ð»Ñƒ.

Context

msgid   "design"
msgstr  "Design"

msgctxt "automobile"
msgid   "design"
msgstr  "Konstruktion"

msgctxt "verb"
msgid   "design"
msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

He writes:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

[repeat, translated]

It is many years later now yet a jack of all trades still does not exist.

Software for translation agencies

In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.

For the dynamism in software localization caused by plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.

In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This seemed like it would be part of a translation agencyâ€™s services.

Recommendation: Have a translation done with a smaller test file. The test should contain all the typical constructs. Repeat per-language, because subcontractors may be involved.

Bibliography

GNU gettext

wikipedia http://en.wikipedia.org/wiki/Gettext

gettext homepage http://www.gnu.org/software/gettext/gettext.html
Singular, Plural, Dual, Trial, Quadral

wikipedia - dual http://en.wikipedia.org/wiki/Dual_%28grammatical_number%29

wikipedia - all forms http://en.wikipedia.org/wiki/Sursurunga_language

sourceforge - which language - which plural form http://translate.sourceforge.net/wiki/l10n/pluralforms
CPAN module Locale::Maketext

CPAN http://search.cpan.org/dist/Locale-Maketext/
CPAN module Locale::Maketext::Simple

CPAN http://search.cpan.org/dist/Locale-Maketext-Simple/
obsolete article by Sean M. Burke about software localization

CPAN http://search.cpan.org/perldoc?Locale::Maketext::TPJ13
CPAN module Locale::TextDomain

CPAN http://search.cpan.org/dist/libintl-perl/
Thanks for the support, the many ideas, examples and corrections.

Nikolai Prokoschenko http://rassie.org/

Nikolai Prokoschenko - On the state of i18n in Perl http://rassie.org/archives/247

Raw

I18N_STEFFENW.pod

Internationalisierungs-Framework auswählen

Autor

Steffen Winkler [email protected]

Bio

Seit 1960 gibt es mich.

Ich programmiere Perl seit Ende 2000, erst privat und dann auch beruflich.

Zur Zeit bin ich bei der SIEMENS AG in Erlangen beschäftigt. Dort arbeite ich vorwiegend im Bereich der Webprogrammierung.

Den Deutschen Perlworkshop besuche ich seit 2003.

Abstract

Warum Locale::TextDomain, obwohl viele Frameworks im CPAN Locale::Maketext benutzen?

Im Anschluss an meinen Vortrag DBD::PO in Frankfurt/Main gab es eine rege Diskussion, sowohl in Frankfurt als auch bei Erlangen-PM.

Es gibt im CPAN 2 Internationalisierungs-Frameworks, Locale::TextDomain (Perl Interface to Uniform Message Translation) und Locale::Maketext (framework for localization).

Was sind die Unterschiede?

Wo sind die Grenzen?

Über was ich heute sprechen möchte.

Vom Quelltext bis zur mehrsprachigen Anwendung auf 2 Wegen.

Egal welches Internationalisierungs-Framework man vom CPAN benutzt, man muss mit Einschränkungen leben. Bei guter Wahl sind diese sehr gering.

Am Anfang ist der Quelltext der Anwendung.

print  'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;

PO-Files - Was ist das?

PO ist die Abkürzung für "portable object".

GNU gettext PO-Files kann man benutzen, um Programme mehrsprachig zu machen.

Im File stehen neben dem Originaltext und der Übersetzung verschiedene Kommentare und Flags.

MO-Files sind die Binärvariante von PO-Files.

auf Locale::Maketext::Simple umschreiben

Verwendet wird dabei das Basismodul Locale::Maketext und ein Modul, welches gettext PO/MO-Files einliest. Das ist Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exportiert die Funktion "loc".

[_n] mit n = 1, 2, ...

ist die generelle Schreibweise für Platzhalter. In den [] kann ein Funktionsname vorangestellt werden, nachgestellt die Parameter. Das Trennzeichen ist das ",". "quant", kurz "*", ist der Funktionsname für Pluralverarbeitung.

print loc('You can log out here.');

print loc(
    'He lives in [_1], [_2].',
    $town,
    $address,
);

print loc(
    '[quant,_1,person lives,people live] here.',
    $people,
);

Ich habe keine Idee, wie man nachfolgende Phrase mit "quant" schreiben soll. Mit "quant" schreibt man so etwas wie Wert und nachfolgender Maßeinheit. Hier beginnt die Pluralform aber schon vor dem Wert. Das Problem ist, "quant" verlangt das Weglassen von "_1" in den Pluralformen und auch das Weglassen des darauf folgenden Leerzeichens.

print loc(  
    '[myplural,_1,It is _1 book,These are _1 books].',
    # ????????    ^^^^^ ???     ^^^^^^^^^ ???
    $books, 
);

print loc(
    'He has [quant,_1,house,houses] in [_2], [_3].',
    $houses,
    $town,
    $address,
);

print loc(
    '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
    $books,
    $shelves,
);

auf Locale::TextDomain umschreiben

Locale::TextDomain gehört zur Distribution libintl-perl. Es gibt mehrere exportierte Funktionen. Der Funktionsname ist einfach gebaut.

x steht für Platzhalter,
n für Pluralform und
p für Kontext.

Die Parameterreihenfolge ist, wenn vorhanden:

Kontext,
Singular,
Plural,
Anzahl für Pluralauswahl und
dann der Hash mit den Platzhalterdaten.

Nicht alle Varianten aus n, p und x sind implementiert. Wenn man x auch ohne Platzhalter benutzt und sich an die alphabetische Reihenfolge hält, bleiben __x, __nx, __px und __npx übrig.

__('msgid')
__x(
    'msgid',
    name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
    'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)
__xn(
    'msgid', 'msgid_plural',
    $count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
    'context', 'msgid',
    name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
    'context', 'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)

print __('You can log out here.');

print __x(
    'He lives in {town}, {address}.',
    town    => $town,
    address => $address,
);

print __nx(
    '{num} person lives here.',
    '{num} people live here.',
    $people,
    num => $people,
);


print __nx(
    'It is {num} book.',
    'These are {num} books.',
    $books,
    num => $books,
);

print __nx(
    'He has {num} house in {town}, {address}.',
    'He has {num} houses in {town}, {address}.',
    $houses,
    num     => $houses,
    town    => $town,
    address => $address,
);

print
    __nx(
        '{num} book is',
        '{num} books are',
        $books,
        num => $books,
    ),
    __nx(
        ' in {num} shelf.',
        ' in {num} shelves.',
        $shelves,
        num => $shelves,
    );

Was sieht man auf den ersten Blick?

Locale::Maketext hat durchnummerierte Parameter. Werden es viele, kann man sie verwechseln. Der Übersetzer, weiß nur, dass etwas eingefügt wird aber nicht was.

[_1] is a [_2] in [_3].

Locale::Maketext kann mit mehreren Pluralformen in einer Textphrase umgehen.

[quant,_1,book is,books are] in [*,_2,shelf,shelves].

Der Text bei Pluralformen (quant) ist nicht mehr automatisch übersetzbar, weil eine Art "oder"-Block enthalten ist.

In diesem "oder"-Block ist der Platzhalter wie z.B. _1 nicht mehr enthalten. Damit sind Pluralformen nicht darstellbar, welche bereits vor der Zahl beginnen.

[myplural,_1,It is _1 book,These are _1 books].

Die Funktion "myplural" gibt es natürlich nicht.

***

Locale::TextDomain hat benannte Parameter, welche sich besser übersetzen lassen, weil der Übersetzer den Sinn des Satzes trotz Platzhalter immer noch verstehen kann.

{name} is a {locality} in {country}.

Bei mehreren Pluralformen in einer Textphrase muss diese zerlegt werden, was nicht mehr automatisch übersetzbar ist.

Was man nicht gleich erkennt.

Anzahl der Pluralformen

Locale Maketext:

Singular
Singular + Plural
Singular + Plural + Zero

Locale::Textdomain:

2 in der Quellsprache
beliebig viele in der Zielsprache

Im Header jedes PO-/MO-Files steht "Plural-Forms". Das ist die Berechnungsvorschrift als C-Code mit einer Ausnahme, "OR" anstatt von "||" ist erlaubt. Diese ist sprachabhängig unterschiedlich in den einzelnen PO-/MO-Files gespeichert. Locale::Maketext ignoriert diesen Eintrag.

Deutsch/Englisch:

"Plural-Forms: nplurals=2; plural=n != 1\n";

Russisch:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

Ein Beispiel aus dem Russischen:

0          books -> ÐºÐ½Ð¸Ð³  (Plural 2)
1          book  -> ÐºÐ½Ð¸Ð³Ð° (Singular)
2 .. 4     books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
5 .. 20    books -> ÐºÐ½Ð¸Ð³  (Plural 2)
21         books -> ÐºÐ½Ð¸Ð³Ð° (Singular)
22 .. 30   books -> ÐºÐ½Ð¸Ð³  (Plural 2)
...
100        books -> ÐºÐ½Ð¸Ð³  (Plural 2)
101        books -> ÐºÐ½Ð¸Ð³Ð° (Singular)
102 .. 104 books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
105 .. 120 books -> ÐºÐ½Ð¸Ð³  (Plural 2)
121        book  -> ÐºÐ½Ð¸Ð³Ð° (Singular)
122 .. 124 books -> ÐºÐ½Ð¸Ð³Ð¸ (Plural 1)
125 .. 130 books -> ÐºÐ½Ð¸Ð³  (Plural 2)
...

3 Pluralformen haben z.B. auch Tschechisch, Litauisch, Polnisch, Rumänisch, Slowakisch. 4 Pluralformen haben z.B. Slovenisch und Keltisch. In der EU kommen wir also mit 4 Plualformen aus. 6 Pluralformen hat Arabisch.

Weil Locale::Maketext "Plural-Forms" im PO-/MO-File ignoriert, sind damit nur Sprachen mit 2 Pluralformen möglich, also Singular und Plural, so wie wir das aus Deutsch und Englisch kennen. Es gibt eine Funktion "quant", welche im Prinzip "quant2" (Singular + 1. Plural) entspricht, wenn man von der Nullform absieht. Man könnte für Locale::Maketext eine Funktion "quant3" bis "quant6" definieren. Damit müsste aber der Programmierer schon wissen, welche Textphrasen 2, 3, 4, 5 oder 6 Pluralformen benötigen. Weil er das nicht weiß, muss er dann immer "quant6" benutzen. Damit schreibt er sich die Finger wund.

Position der Worte im Satz in unterschiedlichen Sprachen

Die Position der einzelnen Worte kann in unterschiedlichen Sprachen unterschiedlich sein, d.h. in einer Sprache heißt es

I have 2 books.

und in einer anderen

2 books I have.

Wenn das so ist, muss man bei Locale::Maketext komplette Sätze in den Pluralformen schreiben. Das kann der Englisch programmierende nicht wissen. Der Konflikt wird also erst während der Übersetzung bekannt.

Wenn man den Konflikt umgehen möchte, schreibt man immer die kompletten Sätze.

Das funktioniert aber auch nicht immer, weil Locale::Maketext nach "quant" immer "_1" erwartet und dann kommt das implizit hinzugefügte Leerzeichen und danach der Text.

Gebraucht würde aber:

[myplural,_1,It is _1 book.,These are _1 books.]

Das ist dann aber nichts anderes als Locale::TextDomain.

Komma in den Pluralformen oder die "join and never can split"-Falle

Durch simple Stringverkettung mit Komma darf kein Komma in verketteten Texten sein.

Gibt es einen Quotingmechanismus wie bei Text::CSV? Mir ist keiner bekannt.

I need 1 book, computer or notebook to do this.

Hier ein dreckiger Workaround mit ";".

I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Wert und Maßeinheit werden ggf. umgebrochen

Durch Stringverkettung mit Leerzeichen entstehen Zeilenumbrüche zwischen Wert und Maßeinheit.

Das ergibt je nach Zeilenlänge

I have
1 book.

oder

I have 1
book.

Für Locale::TextDomain kann man schreiben:

I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.

In Locale::Maketext ist das Leerzeichen unveränderbar im Modulcode enthalten.

Auszug aus dem PO-File für Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid  "You can log out here."
msgstr "Sie können sich hier abmelden."

msgid  "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."

msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."

# a bad workaround (no singular before placeholder)
msgid  "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."

msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

Auszug aus dem PO-File für Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid        "You can log out here."
msgstr       "Sie können sich hier abmelden."

msgid        "He lives in {town}, {address}."
msgstr       "Er wohnt in {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Mensch wohnt hier."
msgstr[1]    "{num} Menschen wohnen hier."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Es ist {num} Buch."
msgstr[1]    "Es sind {num} Bücher."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Er hat {num} Haus in {town}, {address}."
msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} Buch ist"
msgstr[1]    "{num} Bücher sind"

msgid        " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0]    " in {num} Regal."
msgstr[1]    " in {num} Regalen."

PO-File Englisch/Russisch übersetzt

für Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid  "You can log out here."
msgstr "Ð�Ñ�Ñ�Ð¾Ð´ Ð¸Ð· Ñ�Ð¸Ñ�Ñ�ÐµÐ¼Ñ�."

# Hier wÃ¤re Beugung des Stadtnamens notwendig: 
# Ð�Ð¾Ñ�ÐºÐ²Ð° -> Ð² Ð�Ð¾Ñ�ÐºÐ²Ðµ
# Ð�Ð¸ÐµÐ²   -> Ð² Ð�Ð¸ÐµÐ²Ðµ
# Ð�Ñ�Ñ�Ð¸Ñ�Ð¸ -> Ð² Ð�Ñ�Ñ�Ð¸Ñ�Ð°Ñ� (nicht regulÃ¤r)
msgid  "He lives in %1, %2."
msgstr "Ð�Ð½ Ð¶Ð¸Ð²ÐµÑ� Ð² %1, %2"

# This is not correctly translatable.
# The plural form for number 2 to 4 (Ñ�ÐµÐ»Ð¾Ð²ÐµÐºÐ° Ð¶Ð¸Ð²Ñ�Ñ�) is not storable.
msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Ñ�ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÐµÑ�,Ñ�ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²Ñ�Ñ�) Ð·Ð´ÐµÑ�Ñ�."

# This is not correctly translatable.
# The plural form for number 2 to 4 (Ð´Ð¾Ð¼Ð°) is not storable.
msgid  "He has %quant(%1,house,houses) in %2, %3."
msgstr "Ð£ Ð½ÐµÐ³Ð¾ %quant(%1,Ð´Ð¾Ð¼,Ð´Ð¾Ð¼Ð¾Ð²) Ð² %2, %3."

# This is not correctly translatable.
# The plural form for number 2 to 4 (ÐºÐ½Ð¸Ð³Ð¸) is not storable.
msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,ÐºÐ½Ð¸Ð³Ð°,ÐºÐ½Ð¸Ð³) Ð½Ð° %quant(%1,Ð¿Ð¾Ð»ÐºÐµ,Ð¿Ð¾Ð»ÐºÐ°Ñ�)."

für Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid        "You can log out here."
msgstr       "Ð�Ñ�Ñ�Ð¾Ð´ Ð¸Ð· Ñ�Ð¸Ñ�Ñ�ÐµÐ¼Ñ�."

# Hier wÃ¤re Beugung des Stadtnamens notwendig: 
# Ð�Ð¾Ñ�ÐºÐ²Ð° -> Ð² Ð�Ð¾Ñ�ÐºÐ²Ðµ
# Ð�Ð¸ÐµÐ²   -> Ð² Ð�Ð¸ÐµÐ²Ðµ
# Ð�Ñ�Ñ�Ð¸Ñ�Ð¸ -> Ð² Ð�Ñ�Ñ�Ð¸Ñ�Ð°Ñ� (nicht regulÃ¤r)
msgid        "He lives in {town}, {address}."
msgstr       "Ð�Ð½ Ð¶Ð¸Ð²ÐµÑ� Ð² {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Ñ�ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²ÐµÑ� Ð·Ð´ÐµÑ�Ñ�."
msgstr[1]    "{num} Ñ�ÐµÐ»Ð¾Ð²ÐµÐºÐ° Ð¶Ð¸Ð²Ñ�Ñ� Ð·Ð´ÐµÑ�Ñ�."
msgstr[2]    "{num} Ñ�ÐµÐ»Ð¾Ð²ÐµÐº Ð¶Ð¸Ð²Ñ�Ñ� Ð·Ð´ÐµÑ�Ñ�."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "ÐÑ�Ð¾ {num} ÐºÐ½Ð¸Ð³Ð°."
msgstr[1]    "ÐÑ�Ð¾ {num} ÐºÐ½Ð¸Ð³Ð¸."
msgstr[2]    "ÐÑ�Ð¾ {num} ÐºÐ½Ð¸Ð³."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼ Ð² {town}, {address}."
msgstr[1]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼Ð° Ð² {town}, {address}."
msgstr[2]    "Ð£ Ð½ÐµÐ³Ð¾ {num} Ð´Ð¾Ð¼Ð¾Ð² Ð² {town}, {address}."

# Translate this phrase together with the next one.
msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} ÐºÐ½Ð¸Ð³Ð°"
msgstr[1]    "{num} ÐºÐ½Ð¸Ð³Ð¸"
msgstr[2]    "{num} ÐºÐ½Ð¸Ð³"

# Translate this phrase together with the previous one.
msgid        " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐµ."
msgstr[1]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐ°Ñ�."
msgstr[2]    " Ð½Ð° {num} Ð¿Ð¾Ð»ÐºÐ°Ñ�."

Beugen von "in {town}"

Berlin    -> Ð�ÐµÑ�Ð»Ð¸Ð½
in Berlin -> Ð² Ð�ÐµÑ�Ð»Ð¸Ð½Ðµ

Wenn man das will, muß man Platzhalterwerte auch wieder Übersetzen und dann erst einfügen.

Das geht, macht aber das automatische Übersetzen der Phrase unmöglich, in dies eingefügt werden soll. Außerdem kann man diese dann auch wieder nur schwer manuell übersetzen, weil der Zusammenhang wieder etwas verloren geht.

Es ist Bastelei.

neutral/masculin/feminin singular/plural

Beugen von Substantiven:

maskulin singular -> Arzt
feminin singular  -> Ärztin
maskulin plural   -> Ärzte
feminin plural    -> Ärztinnen

Beugen von Verben:

Mascha ist zur Schule gegangen. -> Ð�Ð°Ñ�Ð° Ð¿Ð¾Ñ�Ð»Ð° Ð² Ñ�ÐºÐ¾Ð»Ñ�.
Petja ist zur Schule gegangen.  -> Ð�ÐµÑ�Ñ� Ð¿Ð¾Ñ�Ñ�Ð» Ð² Ñ�ÐºÐ¾Ð»Ñ�.

Kontext

msgid   "design"
msgstr  "Design"

msgctxt "automobile"
msgid   "design"
msgstr  "Konstruktion"

msgctxt "verb"
msgid   "design"
msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Artikel von Sean M. Burke über Software-Lokalisierung

Er schreibt:

Seitdem ich diesen Artikel 1998 schrieb, sehe ich jetzt, dass sich die gettext Dokumentationen jetzt mehr mit der Mehrzahl beschäftigen. Ob nützliche Beschlüsse davon gekommen sind, ist eine andere Frage. -- SMB, Mai 2001

Wir sind jetzt wieder viele Jahre weiter und die "Eierlegende Wollmilchsau" gibt es immer noch nicht.

Software für Übersetzungsbüros

Im aktuellen mir bekannten Fall, benutzt das Übersetzungsbüro die Software "SDL Trados". Es beruht wie andere vergleichbare Software auf einem "translation memory". Das funktioniert sehr gut für statische Dokumente.

Für die Dynamik, welche durch Plural und Kontext in der Softwarelokalisation real existiert, scheint solche Software weniger geeignet. Sie geht von eine 1:1-Übersetzung aus. Man muss also damit rechnen, dass die anteilmäßig eher geringe Teil mit Kontext oder Pluralformen nicht gut softwareunterstützt erbracht werden kann.

Im aktuellen Fall musste das POT-File in XML umgewandelt werden und dann die Zielsprache mit der Quellsprache vorbelegt werden. Diese Leistung hätte man eher vom Übersetzungsbüro erwartet.

Empfehlung: Testübersetzung einer kleineren Datei durchführen lassen. Diese sollte alle typischen Konstrukte enthalten. Und das je Sprache, weil teilweise Subunternehmen eingebunden werden.

Bibliographie

GNU gettext

wikipedia http://en.wikipedia.org/wiki/Gettext

gettext homepage http://www.gnu.org/software/gettext/gettext.html
Singular, Plural, Dual, Trial, Quadral

wikipedia - Dual http://de.wikipedia.org/wiki/Dual_(Grammatik)

wikipedia - alle Formen http://de.wikipedia.org/wiki/Sursurunga

sourceforge - welche Sprache - welche Pluralform http://translate.sourceforge.net/wiki/l10n/pluralforms
CPAN-Modul Locale::Maketext

CPAN http://search.cpan.org/dist/Locale-Maketext/
CPAN-Modul Locale::Maketext::Simple

CPAN http://search.cpan.org/dist/Locale-Maketext-Simple/
veralteter Artikel von Sean M. Burke über Software-Lokalisierung

CPAN http://search.cpan.org/perldoc?Locale::Maketext::TPJ13
CPAN-Modul Locale::TextDomain

CPAN http://search.cpan.org/dist/libintl-perl/
Danke für die Unterstützung, die vielen Ideen, Beispiele und Korrekturen.

Nikolai Prokoschenko http://rassie.org/

Nikolai Prokoschenko - On the state of i18n in Perl http://rassie.org/archives/247

Author

ap commented Apr 8, 2011

I was going to say it is a problem with Github’s POD formatter, since clicking the “raw” link would reveal the source to be fine. But then I remembered the =encoding utf8 directive of POD and realised neither file had them. I added them and now the documents render fine. Thanks for the impetus to figure it out!

ap/I18N_STEFFENW.en.pod

Selecting an Internationalization Framework

Author

Bio

Abstract

What I want to talk about today

It begins with the application's source code

PO files - what are they?

Rewriting to Locale::Maketext::Simple

Rewriting to Locale::TextDomain

What do you see at first glance?

Things you won't spot immediately

Number of plural forms

Position of words in a sentence in different languages

Comma in plural forms, or the "join and never can split" trap

Value and unit may get wrapped

Excerpt from a PO file for Locale::Maketext

extract from a PO file for Locale::TextDomain

PO file for English/Russian translation

for Locale::Maketext

for Locale::TextDomain

Inflecting "in {town}"

neutral/masculine/feminine singular/plural

Context

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

Software for translation agencies

Bibliography

Internationalisierungs-Framework auswählen

Autor

Bio

Abstract

Über was ich heute sprechen möchte.

Am Anfang ist der Quelltext der Anwendung.

PO-Files - Was ist das?

auf Locale::Maketext::Simple umschreiben

auf Locale::TextDomain umschreiben

Was sieht man auf den ersten Blick?

Was man nicht gleich erkennt.

Anzahl der Pluralformen

Position der Worte im Satz in unterschiedlichen Sprachen

Komma in den Pluralformen oder die "join and never can split"-Falle

Wert und Maßeinheit werden ggf. umgebrochen

Auszug aus dem PO-File für Locale::Maketext

Auszug aus dem PO-File für Locale::TextDomain

PO-File Englisch/Russisch übersetzt

für Locale::Maketext

für Locale::TextDomain

Beugen von "in {town}"

neutral/masculin/feminin singular/plural

Kontext

Locale::Maketext::TPJ13 - Artikel von Sean M. Burke über Software-Lokalisierung

Software für Übersetzungsbüros

Bibliographie

ap commented Apr 8, 2011

Uh oh!