Skip to content

Instantly share code, notes, and snippets.

@databento-bot
Last active July 6, 2025 07:23
Show Gist options
  • Save databento-bot/833ba14d87ffa09dbcb36a38b0c832e6 to your computer and use it in GitHub Desktop.
Save databento-bot/833ba14d87ffa09dbcb36a38b0c832e6 to your computer and use it in GitHub Desktop.
Interpolate OHLCV-1s on Databento to ensure exactly one row per second
"""
Databento follows the standard convention of only printing OHLCV records when
trades actually occur within the interval. If no trade occurs within the interval,
no record is printed.
This approach is adopted by most data vendors for two key reasons:
1. Multiple interpolation strategies exist and the optimal choice depends on the
specific use case. Client-side interpolation keeps the strategy transparent.
2. This reduces storage and bandwidth requirements, especially for markets with
many illiquid instruments like options.
To learn more, see: https://databento.com/docs/schemas-and-data-formats/ohlcv
This example demonstrates one way to interpolate OHLCV-1s data to ensure exactly
one row per second. The interpolation strategy used:
- Forward fill the close price from the last known value
- Set open, high, and low equal to the forward-filled close price
- Set volume to 0 for interpolated periods
- Forward fill metadata columns
"""
import pandas as pd
import matplotlib.pyplot as plt
import databento as db
client = db.Historical()
data = client.timeseries.get_range(
dataset="XNAS.ITCH",
schema="ohlcv-1s",
symbols="AAPL",
start="2021-04-09",
end="2021-04-10",
)
df = data.to_df()
# Create a complete second-by-second index using specified start/end times
start_time = pd.Timestamp("2021-04-09")
end_time = pd.Timestamp("2021-04-10")
complete_index = pd.date_range(start=start_time, end=end_time, freq='1s', tz='UTC', inclusive='left')
# Reindex to include all seconds
df_interp = df.reindex(complete_index)
# Forward fill metadata columns first, then backfill if needed
metadata_cols = ['symbol', 'instrument_id', 'publisher_id', 'rtype']
df_interp[metadata_cols] = df_interp[metadata_cols].ffill().bfill()
# Forward fill close prices first (may remain NaN if no prior data exists)
df_interp['close'] = df_interp['close'].ffill()
# For seconds with no trades, set open/high/low equal to the forward-filled close
# and volume to 0 (close may still be NaN if no data exists to forward fill from)
missing_mask = df_interp['open'].isna()
df_interp.loc[missing_mask, 'open'] = df_interp.loc[missing_mask, 'close']
df_interp.loc[missing_mask, 'high'] = df_interp.loc[missing_mask, 'close']
df_interp.loc[missing_mask, 'low'] = df_interp.loc[missing_mask, 'close']
df_interp.loc[missing_mask, 'volume'] = 0
print(df_interp.head(10))
df['open'].plot()
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment