Skip to content

Instantly share code, notes, and snippets.

@jtemporal
Last active January 27, 2017 21:11
Show Gist options
  • Save jtemporal/f641cfe19b8d45a0f0bff524f7f077bf to your computer and use it in GitHub Desktop.
Save jtemporal/f641cfe19b8d45a0f0bff524f7f077bf to your computer and use it in GitHub Desktop.
In [30]: import pandas as pd
In [31]: import numpy as np
In [32]: dataset = pd.read_csv('../serenata-de-amor/data/2016-09-03-companies.xz', dtype={'cnpj': np.str}, low_memory=False)
In [33]: dataset.loc[0]
Out[33]:
situation_date 03/11/2005
type MATRIZ
name COMPANHIA DE AGUAS E ESGOTOS DE RORAIMA CAER
phone (95) 3626-5165
situation ATIVA
neighborhood SAO PEDRO
address R MELVIN JONES
number 219
zip_code 69.306-610
city BOA VISTA
state RR
opening 21/11/1969
legal_entity 203-8 - SOCIEDADE DE ECONOMIA MISTA
trade_name NaN
cnpj 05.939.467/0001-15
last_updated 2016-07-08T05:55:31.679Z
status OK
additional_address_details NaN
email NaN
responsible_federative_entity NaN
situation_reason NaN
special_situation NaN
special_situation_date NaN
message NaN
main_activity_code 36.00-6-01
main_activity Captação, tratamento e distribuição de água
secondary_activity_1 NaN
secondary_activity_10 NaN
secondary_activity_10_code NaN
secondary_activity_11 NaN
...
secondary_activity_88_code NaN
secondary_activity_89 NaN
secondary_activity_89_code NaN
secondary_activity_8_code NaN
secondary_activity_9 NaN
secondary_activity_90 NaN
secondary_activity_90_code NaN
secondary_activity_91 NaN
secondary_activity_91_code NaN
secondary_activity_92 NaN
secondary_activity_92_code NaN
secondary_activity_93 NaN
secondary_activity_93_code NaN
secondary_activity_94 NaN
secondary_activity_94_code NaN
secondary_activity_95 NaN
secondary_activity_95_code NaN
secondary_activity_96 NaN
secondary_activity_96_code NaN
secondary_activity_97 NaN
secondary_activity_97_code NaN
secondary_activity_98 NaN
secondary_activity_98_code NaN
secondary_activity_99 NaN
secondary_activity_99_code NaN
secondary_activity_9_code NaN
latitude 2.82788
longitude -60.6601
latitude.1 2.82788
longitude.1 -60.6601
Name: 0, dtype: object
In [34]: dataset.loc[0, 'cnpj']
Out[34]: '05.939.467/0001-15'
In [35]: type(dataset.loc[0, 'cnpj'])
Out[35]: str
In [36]: dataset.loc[0, 'cnpj'].replace(r'\D', '')
Out[36]: '05.939.467/0001-15'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment