Package logilab :: Package common :: Module textutils
[frames] | no frames]

Module textutils

source code

Some text manipulation utility functions.

:author:    Logilab
:copyright: 2003-2008 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
:contact: http://www.logilab.fr/ -- mailto:contact@logilab.fr
:license: General Public License version 2 - http://www.gnu.org/licenses

:group text formatting: normalize_text, normalize_paragraph, pretty_match,unquote, colorize_ansi
:group text manipulation: searchall, splitstrip
:sort: text formatting, text manipulation

:type ANSI_STYLES: dict(str)
:var ANSI_STYLES: dictionary mapping style identifier to ANSI terminal code

:type ANSI_COLORS: dict(str)
:var ANSI_COLORS: dictionary mapping color identifier to ANSI terminal code

:type ANSI_PREFIX: str
:var ANSI_PREFIX:
  ANSI terminal code notifying the start of an ANSI escape sequence

:type ANSI_END: str
:var ANSI_END:
  ANSI terminal code notifying the end of an ANSI escape sequence

:type ANSI_RESET: str
:var ANSI_RESET:
  ANSI terminal code resetting format defined by a previous ANSI escape sequence

Functions
 
apply_units(string, units, inter=None, final=<type 'float'>, blank_reg=re.compile(r'(\s|,)+'), value_reg=re.compile(r'(?P<value>-?(([0-9]+\.[0-9]*)|((0x?)?[0-9]+)))(?P...)
Parse the string applying the units defined in units (e.g.: "1.5m",{'m',60} -> 80).
source code
 
colorize_ansi(msg, color=None, style=None)
colorize message by wrapping it with ansi escape codes
source code
 
diff_colorize_ansi(lines, out=<epydoc.docintrospecter._DevNull instance at 0x4051d1cc>, style={'add': 'green', 'remove': 'red', 'separator': 'cyan'}) source code
 
normalize_paragraph(text, line_len=80, indent='')
normalize a text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
normalize_rest_paragraph(text, line_len=80, indent='')
normalize a ReST text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
normalize_text(text, line_len=80, indent='', rest=False)
normalize a text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
pretty_match(match, string, underline_char='^')
return a string with the match location underlined:
source code
 
splitstrip(string, sep=',')
return a list of stripped string by splitting the string given as argument on `sep` (',' by default).
source code
 
splittext(text, line_len)
split the given text on space according to the given max line size
source code
 
unormalize(ustring, ignorenonascii=False)
replace diacritical characters with their corresponding ascii characters...
source code
 
unquote(string)
remove optional quotes (simple or double) from the string
source code
Variables
  ANSI_COLORS = {'black': '30', 'blue': '34', 'cyan': '36', 'gre...
  ANSI_END = 'm'
  ANSI_PREFIX = '\x1b['
  ANSI_RESET = '\x1b[0m'
  ANSI_STYLES = {'blink': '5', 'bold': '1', 'inverse': '7', 'ita...
  BYTE_UNITS = {'B': 1, 'GB': 1073741824, 'KB': 1024, 'MB': 1048...
  DIFF_STYLE = {'add': 'green', 'remove': 'red', 'separator': 'c...
  MANUAL_UNICODE_MAP = {u'¡': u'!', u'©': u'(c)', u'«': u'"', u'...
  TIME_UNITS = {'d': 86400, 'h': 3600, 'min': 60, 'ms': 0.0001, ...
  __package__ = 'logilab.common'
  linesep = '\n'
Function Details

apply_units(string, units, inter=None, final=<type 'float'>, blank_reg=re.compile(r'(\s|,)+'), value_reg=re.compile(r'(?P<value>-?(([0-9]+\.[0-9]*)|((0x?)?[0-9]+)))(?P...)

source code 
Parse the string applying the units defined in units
(e.g.: "1.5m",{'m',60} -> 80).

:type string: str or unicode
:param string: the string to parse

:type units: dict (or any object with __getitem__ using basestring key)
:param units: a dict mapping a unit string repr to its value

:type inter: type
:param inter: used to parse every intermediate value (need __sum__)

:type blank_reg: regexp
:param blank_reg: should match every blank char to ignore.

:type value_reg: regexp with "value" and optional "unit" group
:param value_reg: match a value and it's unit into the

colorize_ansi(msg, color=None, style=None)

source code 
colorize message by wrapping it with ansi escape codes

:type msg: str or unicode
:param msg: the message string to colorize

:type color: str or None
:param color:
  the color identifier (see `ANSI_COLORS` for available values)

:type style: str or None
:param style:
  style string (see `ANSI_COLORS` for available values). To get
  several style effects at the same time, use a coma as separator.

:raise KeyError: if an unexistent color or style identifier is given

:rtype: str or unicode
:return: the ansi escaped string

normalize_paragraph(text, line_len=80, indent='')

source code 
normalize a text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_rest_paragraph(text, line_len=80, indent='')

source code 
normalize a ReST text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_text(text, line_len=80, indent='', rest=False)

source code 
normalize a text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized but blank
lines are kept. The indentation string may be used to insert a
comment (#) or a quoting (>) mark  for instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

pretty_match(match, string, underline_char='^')

source code 
return a string with the match location underlined:

>>> import re
>>> print pretty_match(re.search('mange', 'il mange du bacon'), 'il mange du bacon')
il mange du bacon
   ^^^^^
>>>

:type match: _sre.SRE_match
:param match: object returned by re.match, re.search or re.finditer

:type string: str or unicode
:param string:
  the string on which the regular expression has been applied to
  obtain the `match` object

:type underline_char: str or unicode
:param underline_char:
  character to use to underline the matched section, default to the
  carret '^'

:rtype: str or unicode
:return:
  the original string with an inserted line to underline the match
  location

splitstrip(string, sep=',')

source code 
return a list of stripped string by splitting the string given as
argument on `sep` (',' by default). Empty string are discarded.

>>> splitstrip('a, b, c   ,  4,,')
['a', 'b', 'c', '4']
>>> splitstrip('a')
['a']
>>>

:type string: str or unicode
:param string: a csv line

:type sep: str or unicode
:param sep: field separator, default to the comma (',')

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)

splittext(text, line_len)

source code 
split the given text on space according to the given max line size

return a 2-uple:
* a line <= line_len if possible
* the rest of the text which has to be reported on another line

unormalize(ustring, ignorenonascii=False)

source code 
replace diacritical characters with their corresponding ascii characters
    

unquote(string)

source code 
remove optional quotes (simple or double) from the string

:type string: str or unicode
:param string: an optionally quoted string

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)


Variables Details

ANSI_COLORS

Value:
{'black': '30',
 'blue': '34',
 'cyan': '36',
 'green': '32',
 'magenta': '35',
 'red': '31',
 'reset': '0',
 'white': '37',
...

ANSI_STYLES

Value:
{'blink': '5',
 'bold': '1',
 'inverse': '7',
 'italic': '3',
 'reset': '0',
 'strike': '9',
 'underline': '4'}

BYTE_UNITS

Value:
{'B': 1,
 'GB': 1073741824,
 'KB': 1024,
 'MB': 1048576,
 'TB': 1099511627776}

DIFF_STYLE

Value:
{'add': 'green', 'remove': 'red', 'separator': 'cyan'}

MANUAL_UNICODE_MAP

Value:
{u'¡': u'!',
 u'©': u'(c)',
 u'«': u'"',
 u'®': u'(r)',
 u'»': u'"',
 u'Æ': u'AE',
 u'Ø': u'O',
 u'ß': u'ss',
...

TIME_UNITS

Value:
{'d': 86400, 'h': 3600, 'min': 60, 'ms': 0.0001, 's': 1}