Skip to content

Instantly share code, notes, and snippets.

@rg3915
Last active December 15, 2021 18:51
Show Gist options
  • Save rg3915/5fb3a2e7338115bc92e82b7a9a2b372b to your computer and use it in GitHub Desktop.
Save rg3915/5fb3a2e7338115bc92e82b7a9a2b372b to your computer and use it in GitHub Desktop.
Annotations of Pandas DataFrame
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@paulochf
Copy link

paulochf commented Jul 16, 2018

Código que substitui da célula [6] em diante do separe_email.ipynb:

emails_df = df.email_.str.split(";").apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('emails')

dffinal = df.drop("email_", axis=1).merge(emails_df, left_index=True, right_index=True)

Um detalhe é a falta de um índice único dos registros. Então, supondo o índice automático decorrente da importação do arquivo Excel como ID válido, esse ID é o usado no "join" (merge).

@rg3915
Copy link
Author

rg3915 commented Aug 1, 2018

Para transformar Query do Django em DataFrame:
df = pd.DataFrame(list(User.objects.all().values()))

@rg3915
Copy link
Author

rg3915 commented Aug 1, 2018

Pra mostrar somente algumas colunas:
df[['id', 'email']]

@rg3915
Copy link
Author

rg3915 commented Aug 28, 2018

Transformando todos os campos do DataFrame num jSON
df.T.apply(dict).tolist()

@rg3915
Copy link
Author

rg3915 commented Aug 28, 2018

Renomeando colunas
df.rename(columns={'old_name': 'new_name'})

@rg3915
Copy link
Author

rg3915 commented Aug 28, 2018

Selecionando as colunas de 10 em 10
df.iloc[:, :10]
com transposição
df.T.iloc[:10]

@rg3915
Copy link
Author

rg3915 commented Oct 11, 2018

Novo df com algumas colunas.
new = old.filter(['A','B','D'], axis=1)

@rg3915
Copy link
Author

rg3915 commented Oct 12, 2018

Como não inserir chaves com valor nulo no dicionário:
df.T.apply(lambda x: dict(x.dropna())).tolist()

@rg3915
Copy link
Author

rg3915 commented Oct 17, 2018

Retornando os valores cujo tamanho da string seja maior que...
df[df['foo'].str.len() > 50]['foo']

Mostrando o tamanho de cada célula
df[df['foo'].str.len() > 50]['foo'].str.len()

@rg3915
Copy link
Author

rg3915 commented Nov 9, 2018

Mostrando o tamanho de cada célula da coluna

df['foo_len'] = df['foo'].apply(len)
df[['foo_len', 'foo']]

Ou

df['foo'].str.len()

@rg3915
Copy link
Author

rg3915 commented Dec 5, 2018

Suponha que você tenha Cliente e Obra.

Pegando o ID do Cliente que está em Obra e trocando pelo nome do Cliente que está no outro DataFrame.

# JSON do Cliente com IDCliente e Cliente
dict_cliente = df_cliente[['IDCliente', 'Cliente']].T.apply(dict).tolist()

[
 {'IDCliente': 288, 'Cliente': 'Cliente Um'},
 {'IDCliente': 1, 'Cliente': 'Cliente Dois'},
 {'IDCliente': 959, 'Cliente': 'Cliente Três'},
]

Montando o dicionário que será usado como busca de cada Cliente a partir do seu ID.

_dict_cliente = {}
for item in dict_cliente:
    _dict_cliente[item['IDCliente']] = item['Cliente']

_dict_cliente

{
 288: 'Cliente Um',
 1: 'Cliente Dois',
 959: 'Cliente Três',
}

A partir desse dicionário fazemos a busca no outro DataFrame.

for row in df.itertuples():
    nome_cliente = _dict_cliente.get(row.IDCliente)
    print(row.IDCliente, nome_cliente)

288 Cliente Um
1   Cliente Dois
959 Cliente Três

@rg3915
Copy link
Author

rg3915 commented Dec 9, 2018

Retornando o valor máximo agrupado por ano.

df.groupby(['Ano'])['NumeroOrcamento'].max()
df.reset_index()

@rg3915
Copy link
Author

rg3915 commented Dec 10, 2018

Verificando data vazia:

for row in df.itertuples():
    if row.DataOrcamento is pd.NaT:
        print('Vazio')
    else:
        print(row.DataOrcamento)

@rg3915
Copy link
Author

rg3915 commented Dec 16, 2018

Definindo vários fillna diferentes por coluna:

values = {'last_name': '', 'occupation': '', 'age': 0}
df = df.fillna(value=values)
df.head()

@rg3915
Copy link
Author

rg3915 commented Aug 6, 2019

Se tiver problema com liblzma, faça um downgrade do Pandas para pandas==0.24.2.
https://stackoverflow.com/a/57115325

@rg3915
Copy link
Author

rg3915 commented Aug 6, 2019

Retorna o tamanho do maior objeto de cada coluna.

dict_sizes = {}
for col in df.columns:
    try:
        print(f'{col} max length: {df[col].map(len).max()}\n')
        dict_sizes[col] = df[col].map(len).max()
    except Exception as e:
        raise e
dict_sizes

@rg3915
Copy link
Author

rg3915 commented Sep 12, 2019

@rg3915
Copy link
Author

rg3915 commented Nov 6, 2019

dtype example
df['estoque'] =df['estoque'].fillna(0).astype(int)

@rg3915
Copy link
Author

rg3915 commented Nov 6, 2019

Pandas Dataframe df to Django
https://www.laurivan.com/save-pandas-dataframe-as-django-model/

Produto.objects.bulk_create(
    Produto(**item) for item in df.to_dict('records')
)

@rg3915
Copy link
Author

rg3915 commented Nov 13, 2019

Definindo os tipos das colunas com dtype

dict_types_annot = {
    'produto': str,
    'ncm': str,
    'preco': float,
    'estoque': 'Int64',
}

# Define os tipos das colunas
dff = df.astype(dict_types_annot, errors='ignore')

# Troca 'nan' por None e float por None.
dff = dff.replace({'nan': None, float('nan'): None})

dff.to_dict('records')

Produto.objects.bulk_create(
    Produto(**item) for item in dff.to_dict('records')
)

@rg3915
Copy link
Author

rg3915 commented Jan 22, 2020

Intersecção de dataframes

import pandas as pd
import numpy as np
import datetime
from random import randint

df1 = pd.DataFrame({
    'letters': ['A', 'B', 'C', 'D', 'E', 'J', 'K', 'M'],
    'B': np.random.randint(0, 10, 8),
})
df1

df2 = pd.DataFrame({
    'letters': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'U', 'Z'],
    'B': np.random.randint(0, 10, 26),
})
df2

# Retorna o que tem de comum nos dois dataframes.
pd.merge(df1, df2, how='inner', on='fruits')

# Retorna o que tem de comum, considerando o df1.
pd.merge(df1, df2, how='left', on='fruits')

# Retorna o que tem de comum, considerando o df2.
pd.merge(df1, df2, how='right', on='fruits')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment