I18N is an important piece of any modern program. Unfortunately,
setting up i18n in your program is often a confusing process. The
functions provided here aim to make the programming side of that a little
easier.
Most projects will be able to do something like this when they startup:
Then, in other files that have strings that need translating:
# myprogram/commands.py:frommyprogramimport_,N_defprint_usage():print_(u"""available commands are: --help Display help --version Display version of this program --bake-me-a-cake as fast as you can """)defprint_invitations(age):print_('Please come to my party.')printN_('I will be turning %(age)s year old','I will be turning %(age)s years old',age)%{'age':age}
The babel module for in depth information on gettext, message
catalogs, and translating your app. babel provides some nice
features for i18n on top of gettext
domain – Name of the message domain. This should be a unique name
that can be used to lookup the message catalog for this app.
localedirs – Iterator of directories to look for message
catalogs under. The first directory to exist is used regardless of
whether messages for this domain are present. If none of the
directories exist, fallback on sys.prefix + /share/locale
Default: No directories to search so we just use the fallback.
use_unicode – If True return the gettext functions
for unicode strings else return the functions for byte
str for the translations. Default is True.
Returns:
tuple of the gettext function and gettext function
for plurals
Setting up gettext can be a little tricky because of lack of
documentation. This function will setup gettext using the
Class-based API for you.
For the simple case, you can use the default arguments and call it like
this:
_,N_=easy_gettext_setup()
This will get you two functions, _() and N_() that you can use
to mark strings in your code for translation. _() is used to mark
strings that don’t need to worry about plural forms no matter what the
value of the variable is. N_() is used to mark strings that do need
to have a different form if a variable in the string is plural.
for information on how to use localedirs to get the
proper message catalogs both when in development and when
installed to FHS compliant directories on Linux.
Note
The gettext functions returned from this function should be superior
to the ones returned from gettext. The traits that make them
better are described in the DummyTranslations and
NewGNUTranslations documentation.
Changed in version kitchen-0.2.4: ; API kitchen.i18n 2.0.0
Changed easy_gettext_setup() to return the lgettext
functions instead of gettext functions when use_unicode=False.
domain – Name of the message domain. This should be a unique name
that can be used to lookup the message catalog for this app or
library.
localedirs – Iterator of directories to look for
message catalogs under. The directories are searched in order
for message catalogs. For each of the directories searched,
we check for message catalogs in any language specified
in:attr:languages. The message catalogs are used to create
the Translation object that we return. The Translation object will
attempt to lookup the msgid in the first catalog that we found. If
it’s not in there, it will go through each subsequent catalog looking
for a match. For this reason, the order in which you specify the
localedirs may be important. If no message catalogs
are found, either return a DummyTranslations object or raise
an IOError depending on the value of fallback.
Rhe default localedir from gettext which is
os.path.join(sys.prefix,'share','locale') on Unix is
implicitly appended to the localedirs, making it the last
directory searched.
languages –
Iterator of language codes to check for
message catalogs. If unspecified, the user’s locale settings
will be used.
See also
gettext.find() for information on what environment
variables are used.
codeset – Set the character encoding to use when returning byte
str objects. This is equivalent to calling
output_charset() on the Translations
object that is returned from this function.
If you need more flexibility than easy_gettext_setup(), use this
function. It sets up a gettext Translation object and returns it
to you. Then you can access any of the methods of the object that you
need directly. For instance, if you specifically need to access
lgettext():
objects by default. These are superior to the
gettext.GNUTranslations and gettext.NullTranslations
objects because they are consistent in the string type they return and
they fix several issues that can causethe python standard library objects to throw
UnicodeError.
This function takes multiple directories to search for
The latter is important when setting up gettext in a portable
manner. There is not a common directory for translations across operating
systems so one needs to look in multiple directories for the translations.
get_translation_object() is able to handle that if you give it
a list of directories to search for catalogs:
This will search for several different directories:
A directory named locale in the same directory as the module
that called get_translation_object(),
In /usr/lib/locale
In /usr/share/locale (the fallback directory)
This allows gettext to work on Windows and in development (where the
message catalogs are typically in the toplevel module directory)
and also when installed under Linux (where the message catalogs
are installed in /usr/share/locale). You (or the system packager)
just need to install the message catalogs in
/usr/share/locale and remove the locale directory from the
module to make this work. ie:
In development:
~/foo # Toplevel module directory
~/foo/__init__.py
~/foo/locale # With message catalogs below here:
~/foo/locale/es/LC_MESSAGES/foo.mo
Installed on Linux:
/usr/lib/python2.7/site-packages/foo
/usr/lib/python2.7/site-packages/foo/__init__.py
/usr/share/locale/ # With message catalogs below here:
/usr/share/locale/es/LC_MESSAGES/foo.mo
Note
This function will setup Translation objects that attempt to lookup
msgids in all of the found message catalogs. This means if
you have several versions of the message catalogs installed
in different directories that the function searches, you need to make
sure that localedirs specifies the directories so that newer
message catalogs are searched first. It also means that if
a newer catalog does not contain a translation for a msgid but an
older one that’s in localedirs does, the translation from that
older catalog will be returned.
Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0
Add more parameters to get_translation_object() so
it can more easily be used as a replacement for
gettext.translation(). Also change the way we use localedirs.
We cycle through them until we find a suitable locale file rather
than simply cycling through until we find a directory that exists.
The new code is based heavily on the python standard librarygettext.translation() function.
The standard translation objects from the gettext module suffer from
several problems:
They can throw UnicodeError
They can’t find translations for non-ASCII byte str
messages
They may return either unicode string or byte str from the
same function even though the functions say they will only return
unicode or only return byte str.
This Translations class doesn’t translate the strings and is intended to
be used as a fallback when there were errors setting up a real
Translations object. It’s safer than gettext.NullTranslations in
its handling of byte str vs unicode strings.
Unlike NullTranslations, this Translation class will
never throw a UnicodeError. The code that you have
around a call to DummyTranslations might throw
a UnicodeError but at least that will be in code you
control and can fix. Also, unlike NullTranslations all
of this Translation object’s methods guarantee to return byte str
except for ugettext() and ungettext() which guarantee to
return unicode strings.
When byte str are returned, the strings will be encoded according
to this algorithm:
If a fallback has been added, the fallback will be called first.
You’ll need to consult the fallback to see whether it performs any
encoding changes.
If a byte str was given, the same byte str will
be returned.
If a unicode string was given and set_output_charset()
has been called then we encode the string using the
output_charset
If a unicode string was given and this is gettext() or
ngettext() and _charset was set output in that charset.
If a unicode string was given and this is gettext()
or ngettext() we encode it using ‘utf-8’.
If a unicode string was given and this is lgettext()
or lngettext() we encode using the value of
locale.getpreferredencoding()
For ugettext() and ungettext(), we go through the same set of
steps with the following differences:
We transform byte str into unicode strings for
these methods.
The encoding used to decode the byte str is taken from
input_charset if it’s set, otherwise we decode using
UTF-8.
is an extension to the python standard librarygettext that specifies what
charset a message is encoded in when decoding a message to
unicode. This is used for two purposes:
If the message string is a byte str, this is used to decode
the string to a unicode string before looking it up in the
message catalog.
In ugettext() and
ungettext() methods, if a byte
str is given as the message and is untranslated this is used
as the encoding when decoding to unicode. This is different
from _charset which may be set when a message catalog
is loaded because input_charset is used to describe an encoding
used in a python source file while _charset describes the
encoding used in the message catalog file.
Any characters that aren’t able to be transformed from a byte str
to unicode string or vice versa will be replaced with
a replacement character (ie: u'�' in unicode based encodings, '?' in other
ASCII compatible encodings).
For information about what methods are available and what they do.
Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0
* Although we had adapted gettext(), ngettext(),
lgettext(), and lngettext() to always return byte
str, we hadn’t forced those byte str to always be
in a specified charset. We now make sure that gettext() and
ngettext() return byte str encoded using
output_charset if set, otherwise charset and if
neither of those, UTF-8. With lgettext() and
lngettext()output_charset if set, otherwise
locale.getpreferredencoding().
* Make setting input_charset and output_charset also
set those attributes on any fallback translation objects.
gettext.GNUTranslations can return byte str from
gettext.GNUTranslations.ugettext() and unicode
strings from the other gettext()
methods if the message being translated is the wrong type
When byte str are returned, the strings will be encoded
according to this algorithm:
If a fallback has been added, the fallback will be called first.
You’ll need to consult the fallback to see whether it performs any
encoding changes.
If a byte str was given, the same byte str will
be returned.
If a unicode string was given and
set_output_charset() has been called then we encode the
string using the output_charset
If a unicode string was given and this is gettext()
or ngettext() and a charset was detected when parsing the
message catalog, output in that charset.
If a unicode string was given and this is gettext()
or ngettext() we encode it using UTF-8.
If a unicode string was given and this is lgettext()
or lngettext() we encode using the value of
locale.getpreferredencoding()
For ugettext() and ungettext(), we go through the same set of
steps with the following differences:
We transform byte str into unicode strings for these
methods.
The encoding used to decode the byte str is taken from
input_charset if it’s set, otherwise we decode using
UTF-8
an extension to the python standard librarygettext that specifies what
charset a message is encoded in when decoding a message to
unicode. This is used for two purposes:
If the message string is a byte str, this is used to decode
the string to a unicode string before looking it up in the
message catalog.
In ugettext() and
ungettext() methods, if a byte
str is given as the message and is untranslated his is used as
the encoding when decoding to unicode. This is different from
the _charset parameter that may be set when a message
catalog is loaded because input_charset is used to describe an
encoding used in a python source file while _charset describes
the encoding used in the message catalog file.
Any characters that aren’t able to be transformed from a byte
str to unicode string or vice versa will be replaced
with a replacement character (ie: u'�' in unicode based encodings,
'?' in other ASCII compatible encodings).
See also
gettext.GNUTranslations.gettext
For information about what methods this class has and what they do
Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0
Although we had adapted gettext(), ngettext(),
lgettext(), and lngettext() to always return
byte str, we hadn’t forced those byte str to always
be in a specified charset. We now make sure that gettext() and
ngettext() return byte str encoded using
output_charset if set, otherwise charset and if
neither of those, UTF-8. With lgettext() and
lngettext()output_charset if set, otherwise
locale.getpreferredencoding().