(default) will take a guess for control character
widths. Most codes will return zero width. backspace,
delete, and cleardelete return -1. escape currently
returns -1 as well but this is not guaranteed as it’s not always
correct
encoding – If we are given a byte str this is used to
decode it into unicode string. Any characters that are not
decodable in this encoding will get a value dependent on the
errors parameter.
errors – How to treat errors encoding the byte str to
unicode string. Legal values are the same as for
kitchen.text.converters.to_unicode(). The default value of
replace will cause undecodable byte sequences to have a width of
one. ignore will have a width of zero.
Textual width of the msg. This is the amount of
space that the string will consume on a monospace display. It’s
measured in the number of cell positions or columns it will take up on
a monospace display. This is not the number of glyphs that are in
the string.
Note
This function can be wrong sometimes because Unicode does not specify
a strict width value for all of the code points. In
particular, we’ve found that some Tamil characters take up to four
character cells but we return a lesser amount.
encoding – If we are given a byte str, this is used to
decode it into a unicode string. Any characters that are not
decodable in this encoding will be assigned a width of one.
unicode string of the msg chopped at the given
textual width
This is what you want to use instead of %.*s, as it does the “right”
thing with regard to UTF-8 sequences, control characters,
and characters that take more than one cell position. Eg:
>>> # Wrong: only displays 8 characters because it is operating on bytes>>> print"%.*s"%(10,'café ñunru!')café ñun>>> # Properly operates on graphemes>>> '%s'%(textual_width_chop('café ñunru!',10))café ñunru>>> # takes too many columns because the kanji need two cell positions>>> print'1234567890\n%.*s'%(10,u'一二三四五六七八九十')1234567890一二三四五六七八九十>>> # Properly chops at 10 columns>>> print'1234567890\n%s'%(textual_width_chop(u'一二三四五六七八九十',10))1234567890一二三四五
Expand a unicode string to a specified textual width
or chop to same
Parameters:
msg – unicode string to format
fill – pad string until the textual width of the string is
this length
chop – before doing anything else, chop the string to this length.
Default: Don’t chop the string at all
left – If True (default) left justify the string and put the
padding on the right. If False, pad on the left side.
prefix – Attach this string before the field we’re filling
suffix – Append this string to the end of the field we’re filling
Return type:
unicode string
Returns:
msg formatted to fill the specified width. If no
chop is specified, the string could exceed the fill length
when completed. If prefix or suffix are printable
characters, the string could be longer than the fill width.
Note
prefix and suffix should be used for “invisible”
characters like highlighting, color changing escape codes, etc. The
fill characters are appended outside of any prefix or
suffix elements. This allows you to only highlight
msg inside of the field you’re filling.
Warning
msg, prefix, and suffix should all be
representable as unicode characters. In particular, any escape
sequences in prefix and suffix need to be convertible
to unicode. If you need to use byte sequences here rather
than unicode characters, use
byte_string_textual_width_fill() instead.
This function expands a string to fill a field of a particular
textual width. Use it instead of %*.*s, as it does the
“right” thing with regard to UTF-8 sequences, control
characters, and characters that take more than one cell position in
a display. Example usage:
>>> msg=u'一二三四五六七八九十'>>> # Wrong: This uses 10 characters instead of 10 cells:>>> u":%-*.*s:"%(10,10,msg[:9]):一二三四五六七八九 :>>> # This uses 10 cells like we really want:>>> u":%s:"%(textual_width_fill(msg[:9],10,10)):一二三四五:>>> # Wrong: Right aligned in the field, but too many cells>>> u"%20.10s"%(msg) 一二三四五六七八九十>>> # Correct: Right aligned with proper number of cells>>> u"%s"%(textual_width_fill(msg,20,10,left=False)) 一二三四五>>> # Wrong: Adding some escape characters to highlight the line but too many cells>>> u"%s%20.10s%s"%(prefix,msg,suffix)u'[7m 一二三四五六七八九十[0m'>>> # Correct highlight of the line>>> u"%s%s%s"%(prefix,display.textual_width_fill(msg,20,10,left=False),suffix)u'[7m 一二三四五[0m'>>> # Correct way to not highlight the fill>>> u"%s"%(display.textual_width_fill(msg,20,10,left=False,prefix=prefix,suffix=suffix))u' [7m一二三四五[0m'
for other parameters that you can give this command.
This function is a light wrapper around kitchen.text.display.wrap().
Where that function returns a list of lines, this function
returns one string with each line separated by a newline.
chop – before doing anything else, chop the string to this length.
Default: Don’t chop the string at all
left – If True (default) left justify the string and put the
padding on the right. If False, pad on the left side.
prefix – Attach this byte str before the field we’re
filling
suffix – Append this byte str to the end of the field we’re
filling
Return type:
byte str
Returns:
msg formatted to fill the specified textual
width. If no chop is specified, the string could exceed the
fill length when completed. If prefix or suffix are
printable characters, the string could be longer than fill width.
Note
prefix and suffix should be used for “invisible”
characters like highlighting, color changing escape codes, etc. The
fill characters are appended outside of any prefix or
suffix elements. This allows you to only highlight
msg inside of the field you’re filling.
There are a few internal functions and variables in this module. Code outside
of kitchen shouldn’t use them but people coding on kitchen itself may find
them useful.
Internal table, provided by this module to list code points which
combine with other characters and therefore should have no textual
width. This is a sorted tuple of non-overlapping intervals. Each
interval is a tuple listing a starting code point and ending
code point. Every code point between the two end points is
a combining character.
Combine Markus Kuhn’s data with unicodedata to make combining
char list
Return type:
tuple of tuples
Returns:
tuple of intervals of code points that are
combining character. Each interval is a 2-tuple of the
starting code point and the ending code point for the
combining characters.
In normal use, this function serves to tell how we’re generating the
combining char list. For speed reasons, we use this to generate a static
list and just use that later.
Markus Kuhn’s list of combining characters is more complete than what’s in
the python unicodedata library but the python unicodedata is
synced against later versions of the unicode database
This will print a new _COMBINING table in the format used in
kitchen/text/display.py. It’s useful for updating the
_COMBINING table with updated data from a new python as the format
won’t change from what’s already in the file.
table – Ordered list of intervals. This is a list of two-tuples. The
elements of the two-tuple define an interval’s start and end points.
Returns:
If value is found within an interval in the table
return True. Otherwise, False
This function checks whether a numeric value is present within a table
of intervals. It checks using a binary search algorithm, dividing the
list of values in half and checking against the values until it determines
whether the value is in the table.
(default) will take a guess for control character
widths. Most codes will return zero width. backspace,
delete, and cleardelete return -1. escape currently
returns -1 as well but this is not guaranteed as it’s not always
correct
*args – unicode strings to check the total textual
width of
Returns:
True if the total length of args are less than
or equal to width. Otherwise False.
We often want to know “does X fit in Y”. It takes a while to use
textual_width() to calculate this. However, we know that the number
of canonically composed unicode characters is always going to
have 1 or 2 for the textual width per character. With this we can
take the following shortcuts:
If the number of canonically composed characters is more than width,
the true textual width cannot be less than width.
If the number of canonically composed characters * 2 is less than the
width then the textual width must be ok.
textual width of a canonically composed unicode string
will always be greater than or equal to the the number of unicode
characters. So we can first check if the number of composed
unicode characters is less than the asked for width. If it is we
can return True immediately. If not, then we must do a full
textual width lookup.