Skip to content

Instantly share code, notes, and snippets.

@qqpann
Created February 11, 2019 04:12
Show Gist options
  • Save qqpann/26c407e456506974dfdcb7cf6c8523dc to your computer and use it in GitHub Desktop.
Save qqpann/26c407e456506974dfdcb7cf6c8523dc to your computer and use it in GitHub Desktop.
[前処理大全 Awesome Python] 前処理大全でAwesomeとされたPythonコード #Python
# Thanks: https://github.com/ghmagazine/awesomebook
# Filter
df.query('"2016-10-13" <= checkout_date <= "2016-10-14"')
# Sampling
df.sample(frac=0.5) # Random sample 50%
df.sample(n=100) # Specify by N
# 集約ID単位のサンプリング
# ===
# サンプリング時に留意すべきは,割合の変動
# 1行1回の宿泊予約データを50%サンプリングすると
# 予約データの割合は変わらないと考えることができるが,
# それ以外(顧客数の割合とか)の割合は変わってしまう可能性がある
target = pd.Series(df['customer_id'].unique()).sample(frac=0.5)
df[df['customer_id'].isin(target)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment