Created
January 30, 2017 17:37
-
-
Save jtemporal/833b3176f3ef575593a39699fb331bd7 to your computer and use it in GitHub Desktop.
Investigate parser
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [6]: from serenata_toolbox.xml2csv import convert_xml_to_csv | |
In [7]: convert_xml_to_csv('data/AnoAtual.xml', 'data/AnoAtual.csv') | |
2017-01-30 17:28:26 Creating the CSV file | |
2017-01-30 17:28:26 Reading the XML file | |
--------------------------------------------------------------------------- | |
UnicodeEncodeError Traceback (most recent call last) | |
<ipython-input-7-87ccd4d5ef66> in <module>() | |
----> 1 convert_xml_to_csv('data/AnoAtual.xml', 'data/AnoAtual.csv') | |
/root/anaconda3/envs/serenata_rosie/lib/python3.6/site-packages/serenata_toolbox/xml2csv.py in convert_xml_to_csv(xml_file_path, csv_file_path) | |
75 output('Writing record #{:,} to the CSV'.format(count), end='\r') | |
76 with open(csv_file_path, 'a') as csv_file: | |
---> 77 print(csv_io.getvalue(), file=csv_file) | |
78 | |
79 json_io.close() | |
UnicodeEncodeError: 'ascii' codec can't encode character '\xc7' in position 51: ordinal not in range(128) | |
In [8]: convert_xml_to_csv('data/AnoAnterior.xml', 'data/AnoAnterior.csv') | |
2017-01-30 17:29:01 Creating the CSV file | |
2017-01-30 17:29:01 Reading the XML file | |
--------------------------------------------------------------------------- | |
UnicodeEncodeError Traceback (most recent call last) | |
<ipython-input-8-7cf7cedcb958> in <module>() | |
----> 1 convert_xml_to_csv('data/AnoAnterior.xml', 'data/AnoAnterior.csv') | |
/root/anaconda3/envs/serenata_rosie/lib/python3.6/site-packages/serenata_toolbox/xml2csv.py in convert_xml_to_csv(xml_file_path, csv_file_path) | |
75 output('Writing record #{:,} to the CSV'.format(count), end='\r') | |
76 with open(csv_file_path, 'a') as csv_file: | |
---> 77 print(csv_io.getvalue(), file=csv_file) | |
78 | |
79 json_io.close() | |
UnicodeEncodeError: 'ascii' codec can't encode characters in position 59-60: ordinal not in range(128) | |
In [9]: convert_xml_to_csv('data/AnosAnteriores.xml', 'data/AnosAnteriores.xml') | |
2017-01-30 17:30:23 Creating the CSV file | |
2017-01-30 17:30:24 Reading the XML file | |
File "data/AnosAnteriores.xml", line 1 | |
idedocumento,txnomeparlamentar,idecadastro,nucarteiraparlamentar,nulegislatura,sguf,sgpartido,codlegislatura,numsubcota,txtdescricao,numespecificacaosubcota,txtdescricaoespecificacao,txtfornecedor,txtcnpjcpf,txtnumero,indtipodocumento,datemissao,vlrdocumento,vlrglosa,vlrliquido,nummes,numano,numparcela,txtpassageiro,txttrecho,numlote,numressarcimento,vlrrestituicao,nudeputadoid | |
^ | |
XMLSyntaxError: Document is empty, line 1, column 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After running all that is here, tried to run changing the encoding in the
iterparse
withiterparse(open(xml_path, encoding='utf-16'), tag=tag)
. Still no luck*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 224-225: ordinal not in range(128)
Tried to use xmllint to check if the XML files weren't currpted, here is the output: