Skip to content

Instantly share code, notes, and snippets.

@shinseitaro
Last active August 7, 2020 10:00
Show Gist options
  • Save shinseitaro/e7a9311972fb076dd9c76e264c1df434 to your computer and use it in GitHub Desktop.
Save shinseitaro/e7a9311972fb076dd9c76e264c1df434 to your computer and use it in GitHub Desktop.
million や billion を数値に変えるたいときに使える pandas
# 下記のデータをヘッダーをKey、各行をValueにもつ辞書のリストに変換する。ただし、
# company nameは 大文字に変換
# revenueは、通貨記号は削除
# B: billion = 10億 で数値変換
# M: millon = 100万 で数値変換
# n/a: は numpy nan に変換
# stock priceは 小数点1位まで(四捨五入)
string_data = """company name,revenue,stock price
Orelie,$1.64B,50.406
Haleigh,$2.17B,83.541
Mikaela,$6.06B,51.754
Minna,$618.48M,55.853
Hermine,$192.01M,65.51
Borden,$8M,60.212
Cart,n/a,79.988
Kittie,$1.04B,73.679
Cassey,$2.97B,86.23
Alejandra,$2.92B,56.938"""
# このような文字列データを読み込む時は StringIOを使うと便利
import pandas as pd
import io
f = io.StringIO(string_data)
df = pd.read_csv(f)
df["company name"] = df["company name"].str.upper()
df['stock price'] = df['stock price'].round(1)
d = {'M': '*1000000', 'B': '*1000000000'}
df["revenue"] = df["revenue"].str.replace("$","").fillna('float("NaN")').replace(d, regex=True).map(eval)
df.to_dict("record")
# [{'company name': 'ORELIE', 'revenue': 1640000000.0, 'stock price': 50.4},
# {'company name': 'HALEIGH', 'revenue': 2170000000.0, 'stock price': 83.5},
# {'company name': 'MIKAELA', 'revenue': 6060000000.0, 'stock price': 51.8},
# {'company name': 'MINNA', 'revenue': 618480000.0, 'stock price': 55.9},
# {'company name': 'HERMINE', 'revenue': 192010000.0, 'stock price': 65.5},
# {'company name': 'BORDEN', 'revenue': 8000000.0, 'stock price': 60.2},
# {'company name': 'CART', 'revenue': nan, 'stock price': 80.0},
# {'company name': 'KITTIE', 'revenue': 1040000000.0, 'stock price': 73.7},
# {'company name': 'CASSEY', 'revenue': 2970000000.0, 'stock price': 86.2},
# {'company name': 'ALEJANDRA', 'revenue': 2920000000.0, 'stock price': 56.9}]
@shinseitaro
Copy link
Author

shinseitaro commented Aug 7, 2020

d = {'M': '*1000000', 'B': '*1000000000'}
df["revenue"] = df["revenue"].str.replace("$","").fillna('float("NaN")').replace(d, regex=True).map(eval)

ここがいいですよね。
コレを参考にしました: python - Pandas, millions and billions - Stack Overflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment