Python2 has two string types, str and unicode.
unicode represents an abstract sequence of text characters. It can
hold any character that is present in the unicode standard. str can
hold any byte of data. The operating system and python work together to
display these bytes as characters in many cases but you should always keep in
mind that the information is really a sequence of bytes, not a sequence of
characters. In python2 these types are interchangeable a large amount of the
time. They are one of the few pairs of types that automatically convert when
used in equality:
>>> # string is converted to unicode and then compared>>> "I am a string"==u"I am a string"True>>> # Other types, like int, don't have this special treatment>>> 5=="5"False
However, this automatic conversion tends to lull people into a false sense of
security. As long as you’re dealing with ASCII characters the
automatic conversion will save you from seeing any differences. Once you
start using characters that are not in ASCII, you will start getting
UnicodeError and UnicodeWarning as the automatic conversions
between the types fail:
>>> "I am an ñ"==u"I am an ñ"__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequalFalse
Why do these conversions fail? The reason is that the python2
unicode type represents an abstract sequence of unicode text known as
code points. str, on the other hand, really represents
a sequence of bytes. Those bytes are converted by your operating system to
appear as characters on your screen using a particular encoding (usually
with a default defined by the operating system and customizable by the
individual user.) Although ASCII characters are fairly standard in
what bytes represent each character, the bytes outside of the ASCII
range are not. In general, each encoding will map a different character to
a particular byte. Newer encodings map individual characters to multiple
bytes (which the older encodings will instead treat as multiple characters).
In the face of these differences, python refuses to guess at an encoding and
instead issues a warning or exception and refuses to convert.
So what is the best method of dealing with this weltering babble of incoherent
encodings? The basic strategy is to explicitly turn everything into
unicode when it first enters your program. Then, when you send it to
output, you can transform the unicode back into bytes. Doing this allows you
to control the encodings that are used and avoid getting tracebacks due to
UnicodeError. Using the functions defined in this module, that looks
something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> fromkitchen.text.convertersimportto_unicode,to_bytes>>> name=raw_input('Enter your name: ')Enter your name: Toshio くらとみ>>> name'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'>>> type(name)<type 'str'>>>> unicode_name=to_unicode(name)>>> type(unicode_name)<type 'unicode'>>>> unicode_nameu'Toshio \u304f\u3089\u3068\u307f'>>> # Do a lot of other things before needing to save/output again:>>> output=open('datafile','w')>>> output.write(to_bytes(u'Name: %s\\n'%unicode_name))
A few notes:
Looking at line 6, you’ll notice that the input we took from the user was
a byte str. In general, anytime we’re getting a value from outside
of python (The filesystem, reading data from the network, interacting with an
external command, reading values from the environment) we are interacting with
something that will want to give us a byte str. Some python standard library
modules and third party libraries will automatically attempt to convert a byte
str to unicode strings for you. This is both a boon and
a curse. If the library can guess correctly about the encoding that the data
is in, it will return unicode objects to you without you having to
convert. However, if it can’t guess correctly, you may end up with one of
several problems:
UnicodeError
The library attempted to decode a byte str into
a unicode, string failed, and raises an exception.
Garbled data
If the library returns the data after decoding it with the wrong encoding,
the characters you see in the unicode string won’t be the ones that
you expect.
A byte str instead of unicode string
Some libraries will return a unicode string when they’re able to
decode the data and a byte str when they can’t. This is
generally the hardest problem to debug when it occurs. Avoid it in your
own code and try to avoid or open bugs against upstreams that do this. See
Designing Unicode Aware APIs for strategies to do this properly.
On line 8, we convert from a byte str to a unicode string.
to_unicode() does this for us. It has some
error handling and sane defaults that make this a nicer function to use than
calling str.decode() directly:
Instead of defaulting to the ASCII encoding which fails with all
but the simple American English characters, it defaults to UTF-8.
Instead of raising an error if it cannot decode a value, it will replace
the value with the unicode “Replacement character” symbol (�).
If you happen to call this method with something that is not a str
or unicode, it will return an empty unicode string.
All three of these can be overridden using different keyword arguments to the
function. See the to_unicode() documentation for more information.
On line 15 we push the data back out to a file. Two things you should note here:
We deal with the strings as unicode until the last instant. The
string format that we’re using is unicode and the variable also
holds unicode. People sometimes get into trouble when they mix
a byte str format with a variable that holds a unicode
string (or vice versa) at this stage.
to_bytes(), does the reverse of
to_unicode(). In this case, we’re using the default values which
turn unicode into a byte str using UTF-8. Any
errors are replaced with a � and sending nonstring objects yield empty
unicode strings. Just like to_unicode(), you can look at
the documentation for to_bytes() to find out how to override any of
these defaults.
The default strategy of decoding to unicode strings when you take
data in and encoding to a byte str when you send the data back out
works great for most problems but there are a few times when you shouldn’t:
The values aren’t meant to be read as text
The values need to be byte-for-byte when you send them back out – for
instance if they are database keys or filenames.
You are transferring the data between several libraries that all expect
byte str.
In each of these instances, there is a reason to keep around the byte
str version of a value. Here’s a few hints to keep your sanity in
these situations:
Keep your unicode and str values separate. Just like the
pain caused when you have to use someone else’s library that returns both
unicode and str you can cause yourself pain if you have
functions that can return both types or variables that could hold either
type of value.
Name your variables so that you can tell whether you’re storing byte
str or unicode string. One of the first things you end
up having to do when debugging is determine what type of string you have in
a variable and what type of string you are expecting. Naming your
variables consistently so that you can tell which type they are supposed to
hold will save you from at least one of those steps.
When you get values initially, make sure that you’re dealing with the type
of value that you expect as you save it. You can use isinstance()
or to_bytes() since to_bytes() doesn’t do any modifications of
the string if it’s already a str. When using to_bytes()
for this purpose you might want to use:
The reason is that the default of to_bytes() will take characters
that are illegal in the chosen encoding and transform them to replacement
characters. Since the point of keeping this data as a byte str is
to keep the exact same bytes when you send it outside of your code,
changing things to replacement characters should be rasing red flags that
something is wrong. Setting errors to strict will raise an
exception which gives you an opportunity to fail gracefully.
Sometimes you will want to print out the values that you have in your byte
str. When you do this you will need to make sure that you
transform unicode to str before combining them. Also be
sure that any other function calls (including gettext) are going to
give you strings that are the same type. For instance:
Even when you have a good conceptual understanding of how python2 treats
unicode and str there are still some things that can
surprise you. In most cases this is because, as noted earlier, python or one
of the python libraries you depend on is trying to convert a value
automatically and failing. Explicit conversion at the appropriate place
usually solves that.
One common idiom for getting a simple, string representation of an object is to use:
str(obj)
Unfortunately, this is not safe. Sometimes str(obj) will return
unicode. Sometimes it will return a byte str. Sometimes,
it will attempt to convert from a unicode string to a byte
str, fail, and throw a UnicodeError. To be safe from all of
these, first decide whether you need unicode or str to be
returned. Then use to_unicode() or to_bytes() to get the simple
representation like this:
python has a builtin print() statement that outputs strings to the
terminal. This originated in a time when python only dealt with byte
str. When unicode strings came about, some enhancements
were made to the print() statement so that it could print those as well.
The enhancements make print() work most of the time. However, the times
when it doesn’t work tend to make for cryptic debugging.
The basic issue is that print() has to figure out what encoding to use
when it prints a unicode string to the terminal. When python is
attached to your terminal (ie, you’re running the interpreter or running
a script that prints to the screen) python is able to take the encoding value
from your locale settings LC_ALL or LC_CTYPE and print the
characters allowed by that encoding. On most modern Unix systems, the
encoding is utf-8 which means that you can print any unicode
character without problem.
There are two common cases of things going wrong:
Someone has a locale set that does not accept all valid unicode characters.
For instance:
$ LC_ALL=C python
>>> print u'\ufffd'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
This often happens when a script that you’ve written and debugged from the
terminal is run from an automated environment like cron. It
also occurs when you have written a script using a utf-8 aware
locale and released it for consumption by people all over the internet.
Inevitably, someone is running with a locale that can’t handle all unicode
characters and you get a traceback reported.
You redirect output to a file. Python isn’t using the values in
LC_ALL unconditionally to decide what encoding to use. Instead
it is using the encoding set for the terminal you are printing to which is
set to accept different encodings by LC_ALL. If you redirect
to a file, you are no longer printing to the terminal so LC_ALL
won’t have any effect. At this point, python will decide it can’t find an
encoding and fallback to ASCII which will likely lead to
UnicodeError being raised. You can see this in a short script:
#! /usr/bin/python -ttprintu'\ufffd'
And then look at the difference between running it normally and redirecting to a file:
$ ./test.py
�$ ./test.py > t
Traceback (most recent call last): File "test.py", line 3, in <module> print u'\ufffd'UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
The short answer to dealing with this is to always use bytes when writing
output. You can do this by explicitly converting to bytes like this:
or you can wrap stdout and stderr with a StreamWriter.
A StreamWriter is convenient in that you can assign it to
encode for sys.stdout or sys.stderr and then have output
automatically converted but it has the drawback of still being able to throw
UnicodeError if the writer can’t encode all possible unicode
codepoints. Kitchen provides an alternate version which can be retrieved with
kitchen.text.converters.getwriter() which will not traceback in its
standard configuration.
How do you work with this one? Remember rule #1: Keep your unicode
and byte str values separate. That goes for keys in a dictionary
just like anything else.
For any given dictionary, make sure that all your keys are either
unicode or str. Do not mix the two. If you’re being
given both unicode and str but you don’t need to preserve
separate keys for each, I recommend using to_unicode() or
to_bytes() to convert all keys to one type or the other like this:
These issues also apply to using dicts with tuple keys that contain
a mixture of unicode and str. Once again the best fix
is to standardise on either str or unicode.
If you absolutely need to store values in a dictionary where the keys could
be either unicode or str you can use
StrictDict which has separate
entries for all unicode and byte str and deals correctly
with any tuple containing mixed unicode and byte
str.
obj – Object to convert to a unicode string. This should
normally be a byte str
encoding – What encoding to try converting the byte str as.
Defaults to utf-8
errors – If errors are found while decoding, perform this action.
Defaults to replace which replaces the invalid bytes with
a character that means the bytes were unable to be decoded. Other
values are the same as the error handling schemes in the codec base
classes.
For instance strict which raises an exception and ignore which
simply omits the non-decodable characters.
nonstring –
How to treat nonstring values. Possible values are:
simplerepr:
Attempt to call the object’s “simple representation”
method and return that value. Python-2.3+ has two methods that
try to return a simple representation: object.__unicode__()
and object.__str__(). We first try to get a usable value
from object.__unicode__(). If that fails we try the same
with object.__str__().
empty:
Return an empty unicode string
strict:
Raise a TypeError
passthru:
Return the object unchanged
repr:
Attempt to return a unicode string of the repr of the
object
Default is simplerepr
non_string – Deprecated Use nonstring instead
Raises:
TypeError – if nonstring is strict and
a non-basestring object is passed in or if nonstring
is set to an unknown value
UnicodeDecodeError – if errors is strict and
obj is not decodable using the given encoding
Returns:
unicode string or the original object depending on the
value of nonstring.
Usually this should be used on a byte str but it can take both
byte str and unicode strings intelligently. Nonstring
objects are handled in different ways depending on the setting of the
nonstring parameter.
The default values of this function are set so as to always return
a unicode string and never raise an error when converting from
a byte str to a unicode string. However, when you do
not pass validly encoded text (or a nonstring object), you may end up with
output that you don’t expect. Be sure you understand the requirements of
your data, not just ignore errors by passing it through this function.
Changed in version 0.2.1a2: Deprecated non_string in favor of nonstring parameter and changed
default value to simplerepr
obj – Object to convert to a byte str. This should normally
be a unicode string.
encoding – Encoding to use to convert the unicode string
into a byte str. Defaults to utf-8.
errors –
If errors are found while encoding, perform this action.
Defaults to replace which replaces the invalid bytes with
a character that means the bytes were unable to be encoded. Other
values are the same as the error handling schemes in the codec base
classes.
For instance strict which raises an exception and ignore which
simply omits the non-encodable characters.
nonstring –
How to treat nonstring values. Possible values are:
simplerepr:
Attempt to call the object’s “simple representation”
method and return that value. Python-2.3+ has two methods that
try to return a simple representation: object.__unicode__()
and object.__str__(). We first try to get a usable value
from object.__str__(). If that fails we try the same
with object.__unicode__().
empty:
Return an empty byte str
strict:
Raise a TypeError
passthru:
Return the object unchanged
repr:
Attempt to return a byte str of the repr() of the
object
Default is simplerepr.
non_string – Deprecated Use nonstring instead.
Raises:
TypeError – if nonstring is strict and
a non-basestring object is passed in or if nonstring
is set to an unknown value.
UnicodeEncodeError – if errors is strict and all of the
bytes of obj are unable to be encoded using encoding.
Returns:
byte str or the original object depending on the value
of nonstring.
Warning
If you pass a byte str into this function the byte
str is returned unmodified. It is not re-encoded with
the specified encoding. The easiest way to achieve that is:
to_bytes(to_unicode(text),encoding='utf-8')
The initial to_unicode() call will ensure text is
a unicode string. Then, to_bytes() will turn that into
a byte str with the specified encoding.
Usually, this should be used on a unicode string but it can take
either a byte str or a unicode string intelligently.
Nonstring objects are handled in different ways depending on the setting
of the nonstring parameter.
The default values of this function are set so as to always return a byte
str and never raise an error when converting from unicode to
bytes. However, when you do not pass an encoding that can validly encode
the object (or a non-string object), you may end up with output that you
don’t expect. Be sure you understand the requirements of your data, not
just ignore errors by passing it through this function.
Changed in version 0.2.1a2: Deprecated non_string in favor of nonstring parameter
and changed default value to simplerepr
The StreamWriter that is returned will take byte
str as well as unicode strings. Any byte
str will be passed through unmodified.
The default error handler for unknown bytes is to replace the bytes
with the unknown character (? in most ascii-based encodings, �
in the utf encodings) whereas codecs.getwriter() defaults to
strict. Like codecs.StreamWriter, the returned
StreamWriter can have its error handler changed in
code by setting stream.errors='new_handler_name'
This function converts something to a byte str if it isn’t one.
It’s used to call str() or unicode() on the object to get its
simple representation without danger of getting a UnicodeError.
You should be using to_unicode() or to_bytes() explicitly
instead.
Take a unicode string and turn it into a byte str
suitable for xml
Parameters:
string – unicode string to encode into an XML compatible byte
str
encoding – encoding to use for the returned byte str.
Default is to encode to UTF-8. If some of the characters in
string are not encodable in this encoding, the unknown
characters will be entered into the output string using xml character
references.
attrib – If True, quote the string for use in an xml
attribute. If False (default), quote for use in an xml text
field.
control_chars –
control characters are not allowed in XML
documents. When we encounter those we need to know what to do. Valid
options are:
ValueError – If control_chars is set to something other than
replace, ignore, or strict.
Return type:
byte str
Returns:
representation of the unicode string as a valid XML
byte str
XML files consist mainly of text encoded using a particular charset. XML
also denies the use of certain bytes in the encoded text (example: ASCIINull). There are also special characters that must be escaped if they
are present in the input (example: <). This function takes care of
all of those issues for you.
There are a few different ways to use this function depending on your
needs. The simplest invocation is like this:
unicode_to_xml(u'String with non-ASCII characters: <"á と">')
This will return the following to you, encoded in utf-8:
'String with non-ASCII characters: <"á と">'
Pretty straightforward. Now, what if you need to encode your document in
something other than utf-8? For instance, latin-1? Let’s
see:
unicode_to_xml(u'String with non-ASCII characters: <"á と">',encoding='latin-1')'String with non-ASCII characters: <"á と">'
Because the と character is not available in the latin-1 charset,
it is replaced with と in our output. This is an xml character
reference which represents the character at unicode codepoint 12392, the
と character.
When you want to reverse this, use xml_to_unicode() which will turn
a byte str into a unicode string and replace the xml
character references with the unicode characters.
XML also has the quirk of not allowing control characters in its
output. The control_chars parameter allows us to specify what to
do with those. For use cases that don’t need absolute character by
character fidelity (example: holding strings that will just be used for
display in a GUI app later), the default value of replace works well:
unicode_to_xml(u'String with disallowed control chars: \u0000\u0007')'String with disallowed control chars: ??'
If you do need to be able to reproduce all of the characters at a later
date (examples: if the string is a key value in a database or a path on a
filesystem) you have many choices. Here are a few that rely on utf-7,
a verbose encoding that encodes control characters (as well as
non-ASCII unicode values) to characters from within the
ASCII printable characters. The good thing about doing this is
that the code is pretty simple. You just need to use utf-7 both when
encoding the field for xml and when decoding it for use in your python
program:
unicode_to_xml(u'String with unicode: と and control char: \u0007',encoding='utf7')'String with unicode: +MGg and control char: +AAc-'# [...]xml_to_unicode('String with unicode: +MGg and control char: +AAc-',encoding='utf7')u'String with unicode: と and control char: \u0007'
As you can see, the utf-7 encoding will transform even characters that
would be representable in utf-8. This can be a drawback if you
want unicode characters in the file to be readable without being decoded
first. You can work around this with increased complexity in your
application code:
encoding='utf-8'u_string=u'String with unicode: と and control char: \u0007'try:# First attempt to encode to utf8data=unicode_to_xml(u_string,encoding=encoding,errors='strict')exceptXmlEncodeError:# Fallback to utf-7encoding='utf-7'data=unicode_to_xml(u_string,encoding=encoding,errors='strict')write_tag('<mytag encoding=%s>%s</mytag>'%(encoding,data))# [...]encoding=tag.attributes.encodingu_string=xml_to_unicode(u_string,encoding=encoding)
Using code similar to that, you can have some fields encoded using your
default encoding and fallback to utf-7 if there are control
characters present.
Note
If your goal is to preserve the control characters you cannot
save the entire file as utf-7 and set the xml encoding parameter
to utf-7 if your goal is to preserve the control
characters. Because XML doesn’t allow control characters,
you have to encode those separate from any encoding work that the XML
parser itself knows about.
Transform a byte str from an xml file into a unicode
string
Parameters:
byte_string – byte str to decode
encoding – encoding that the byte str is in
errors – What to do if not every character is valid in
encoding. See the to_unicode() documentation for legal
values.
Return type:
unicode string
Returns:
string decoded from byte_string
This function attempts to reverse what unicode_to_xml() does. It
takes a byte str (presumably read in from an xml file) and
expands all the html entities into unicode characters and decodes the byte
str into a unicode string. One thing it cannot do is
restore any control characters that were removed prior to
inserting into the file. If you need to keep such characters you need to
use xml_to_bytes() and bytes_to_xml() or use on of the
strategies documented in unicode_to_xml() instead.
Make sure a byte str is validly encoded for xml output
Parameters:
byte_string – Byte str to turn into valid xml output
input_encoding – Encoding of byte_string. Default utf-8
errors –
How to handle errors encountered while decoding the
byte_string into unicode at the beginning of the
process. Values are:
replace:
(default) Replace the invalid bytes with a ?
ignore:
Remove the characters altogether from the output
strict:
Raise an UnicodeDecodeError when we encounter
a non-decodable character
output_encoding – Encoding for the xml file that this string will go
into. Default is utf-8. If all the characters in
byte_string are not encodable in this encoding, the unknown
characters will be entered into the output string using xml character
references.
attrib – If True, quote the string for use in an xml
attribute. If False (default), quote for use in an xml text
field.
control_chars –
XML does not allow control characters. When
we encounter those we need to know what to do. Valid options are:
XmlEncodeError – If control_chars is set to strict and
the string to be made suitable for output to xml contains
control characters then we raise this exception.
UnicodeDecodeError – If errors is set to strict and the
byte_string contains bytes that are not decodable using
input_encoding, this error is raised
Return type:
byte str
Returns:
representation of the byte str in the output encoding with
any bytes that aren’t available in xml taken care of.
Use this when you have a byte str representing text that you need
to make suitable for output to xml. There are several cases where this
is the case. For instance, if you need to transform some strings encoded
in latin-1 to utf-8 for output:
Transform a byte str from an xml file into unicode
string
Parameters:
byte_string – byte str to decode
input_encoding – encoding that the byte str is in
errors – What to do if not every character is valid in
encoding. See the to_unicode() docstring for legal
values.
output_encoding – Encoding for the output byte str
Returns:
unicode string decoded from byte_string
This function attempts to reverse what unicode_to_xml() does. It
takes a byte str (presumably read in from an xml file) and
expands all the html entities into unicode characters and decodes the
byte str into a unicode string. One thing it cannot do
is restore any control characters that were removed prior to
inserting into the file. If you need to keep such characters you need to
use xml_to_bytes() and bytes_to_xml() or use one of the
strategies documented in unicode_to_xml() instead.
Return a byte str encoded so it is valid inside of any xml
file
Parameters:
byte_string – byte str to transform
**kwargs (*args,) – extra arguments to this function are passed on to
the function actually implementing the encoding. You can use this to
tweak the output in some cases but, as a general rule, you shouldn’t
because the underlying encoding function is not guaranteed to remain
the same.
byte str representation of the input. This will be encoded
using base64.
This function is made especially to put binary information into xml
documents.
This function is intended for encoding things that must be preserved
byte-for-byte. If you want to encode a byte string that’s text and don’t
mind losing the actual bytes you probably want to try byte_string_to_xml()
or guess_encoding_to_xml() instead.
Note
Although the current implementation uses base64.b64encode() and
there’s no plans to change it, that isn’t guaranteed. If you want to
make sure that you can encode and decode these messages it’s best to
use xml_to_bytes() if you use this function to encode.
byte_string – byte str to transform. This should be a base64
encoded sequence of bytes originally generated by bytes_to_xml().
**kwargs (*args,) – extra arguments to this function are passed on to
the function actually implementing the encoding. You can use this to
tweak the output in some cases but, as a general rule, you shouldn’t
because the underlying encoding function is not guaranteed to remain
the same.
Return type:
byte str
Returns:
byte str that’s the decoded input
If you’ve got fields in an xml document that were encoded with
bytes_to_xml() then you want to use this function to undecode them.
It converts a base64 encoded string into a byte str.
Note
Although the current implementation uses base64.b64decode() and
there’s no plans to change it, that isn’t guaranteed. If you want to
make sure that you can encode and decode these messages it’s best to
use bytes_to_xml() if you use this function to decode.
string – unicode or byte str to be transformed into
a byte str suitable for inclusion in xml. If string is
a byte str we attempt to guess the encoding. If we cannot guess,
we fallback to latin-1.
output_encoding – Output encoding for the byte str. This
should match the encoding of your xml file.
attrib – If True, escape the item for use in an xml
attribute. If False (default) escape the item for use in
a text node.
kitchen.text.converters.EXCEPTION_CONVERTERS = (<function <lambda> at 0x7fab7586b230>, <function <lambda> at 0x7fab7586b2a8>)¶
Tuple of functions to try to use to convert an exception into a string
representation. Its main use is to extract a string (unicode or
str) from an exception object in exception_to_unicode() and
exception_to_bytes(). The functions here will try the exception’s
args[0] and the exception itself (roughly equivalent to
str(exception)) to extract the message. This is only a default and can
be easily overridden when calling those functions. There are several
reasons you might wish to do that. If you have exceptions where the best
string representing the exception is not returned by the default
functions, you can add another function to extract from a different
field:
Another reason would be if you’re converting to a byte str and
you know the str needs to be a non-utf-8 encoding.
exception_to_bytes() defaults to utf-8 but if you convert
into a byte str explicitly using a converter then you can choose
a different encoding:
Each function in this list should take the exception as its sole argument
and return a string containing the message representing the exception.
The functions may return the message as a :byte class:str,
a unicode string, or even an object if you trust the object to
return a decent string representation. The exception_to_unicode()
and exception_to_bytes() functions will make sure to convert the
string to the proper type before returning.
New in version 0.2.2.
kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS = (<function <lambda> at 0x7fab7586b320>, <function to_bytes at 0x7fab7586b050>)¶
Tuple of functions to try to use to convert an exception into a string
representation. This tuple is similar to the one in
EXCEPTION_CONVERTERS but it’s used with exception_to_bytes()
instead. Ideally, these functions should do their best to return the data
as a byte str but the results will be run through
to_bytes() before being returned.
New in version 0.2.2.
Changed in version 1.0.1: Deprecated as simplifications allow EXCEPTION_CONVERTERS to
perform the same function.
kitchen.text.converters.exception_to_unicode(exc, converters=(<function <lambda> at 0x7fab7586b230>, <function <lambda> at 0x7fab7586b2a8>))¶
Convert an exception object into a unicode representation
Parameters:
exc – Exception object to convert
converters – List of functions to use to convert the exception into
a string. See EXCEPTION_CONVERTERS for the default value and
an example of adding other converters to the defaults. The functions
in the list are tried one at a time to see if they can extract
a string from the exception. The first one to do so without raising
an exception is used.
Returns:
unicode string representation of the exception. The
value extracted by the converters will be converted into
unicode before being returned using the utf-8
encoding. If you know you need to use an alternate encoding add
a function that does that to the list of functions in
converters)
New in version 0.2.2.
kitchen.text.converters.exception_to_bytes(exc, converters=(<function <lambda> at 0x7fab7586b230>, <function <lambda> at 0x7fab7586b2a8>))¶
Convert an exception object into a str representation
Parameters:
exc – Exception object to convert
converters – List of functions to use to convert the exception into
a string. See EXCEPTION_CONVERTERS for the default value and
an example of adding other converters to the defaults. The functions
in the list are tried one at a time to see if they can extract
a string from the exception. The first one to do so without raising
an exception is used.
Returns:
byte str representation of the exception. The value
extracted by the converters will be converted into
str before being returned using the utf-8 encoding.
If you know you need to use an alternate encoding add a function that
does that to the list of functions in converters)
New in version 0.2.2.
Changed in version 1.0.1: Code simplification allowed us to switch to using
EXCEPTION_CONVERTERS as the default value of
converters.