- The Python Standard Library, especially str.methods and string module are powerful for text processing. Start there.
- regex - Extends Python's Standard Library
remodule while being backwards-compatible. - chardet - Finds character encoding.
- ftfy - Take in bad Unicode and output good Unicode. Seriously automagical.
- ploygot - Helpful for multilingual preprocessing.
In your Python package, you have:
- an
__init__.pythat designates this as a Python package - a
module_a.py, containing a functionaction_a()that references an attribute (like a function or variable) inmodule_b.py, and - a
module_b.py, containing a functionaction_b()that references an attribute (like a function or variable) inmodule_a.py.
This situation can introduce a circular import error: module_a attempts to import module_b, but can't, because module_b needs to import module_a, which is in the process of being interpreted.
But, sometimes Python is magic, and code that looks like it should cause this circular import error works just fine!