Skip to content

Instantly share code, notes, and snippets.

@drorata
Created August 3, 2017 09:22
Show Gist options
  • Select an option

  • Save drorata/c837ba1e5fd155bb6a65c2745f0f06ed to your computer and use it in GitHub Desktop.

Select an option

Save drorata/c837ba1e5fd155bb6a65c2745f0f06ed to your computer and use it in GitHub Desktop.
Rows per time unit using pandas

Assume you have a DataFrame as below:

import pandas as pd
import numpy as np

np.random.seed(42)
N = 10
df = pd.DataFrame(
  {
    "val": np.random.random(size=N),
    "ts": np.random.choice(['2017-07-01', '2017-07-02', '2017-07-03'], size=N)
  }
)
df['ts'] = pd.to_datetime(df.ts)

which looks like so:

        ts     	        val
0	2017-07-02	0.374540
1	2017-07-01	0.950714
2	2017-07-02	0.731994
3	2017-07-02	0.598658
4	2017-07-02	0.156019
5	2017-07-02	0.155995
6	2017-07-01	0.058084
7	2017-07-01	0.866176
8	2017-07-02	0.601115
9	2017-07-02	0.708073

We would like to count how many events per day occurred on average:

df.resample('86400s', on='ts').agg('size')
# Or, equivalently:
# df.resample('1d', on='ts').agg('size')

This yields the following series:

ts
2017-07-01    3
2017-07-02    7
Freq: 86400S, dtype: int64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment