Created
August 4, 2012 15:21
-
-
Save karlcow/3258330 to your computer and use it in GitHub Desktop.
Silly lxml bug in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>>from lxml import etree | |
>>> xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>' | |
>>> etree.XML(xml) | |
Traceback (most recent call last): | |
File "<stdin>", line 1, in <module> | |
File "lxml.etree.pyx", line 2736, in lxml.etree.XML (src/lxml/lxml.etree.c:54437) | |
File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82685) | |
ValueError: Unicode strings with encoding declaration are not supported. | |
>>> etree.HTML(xml) | |
Traceback (most recent call last): | |
File "<stdin>", line 1, in <module> | |
File "lxml.etree.pyx", line 2708, in lxml.etree.HTML (src/lxml/lxml.etree.c:54160) | |
File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82685) | |
ValueError: Unicode strings with encoding declaration are not supported. | |
>>> lxml.etree.__version__ | |
u'2.3.3' | |
>>> xml = u"<foo><bar/></foo>" | |
>>> etree.HTML(xml) | |
<Element html at 0x105364870> | |
>>> etree.XML(xml) | |
<Element foo at 0x105395a00> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@kernc we won't fix it because it isn't a bug. If you're using requests to get this string then the following should always work:
If you instead use
r.text
, that is when you'll run into problems. On the other hand, from this gist, it seems clear this is something with lxml and not requests. One call with a unicode string doesn't work, while a different does. And from the error and the discussion on LaunchPad, it seems like this intentional.