Skip to content

Instantly share code, notes, and snippets.

@iangow
Last active March 31, 2026 21:29
Show Gist options
  • Select an option

  • Save iangow/812e9ab6c7813616bb4a61c2ff8881df to your computer and use it in GitHub Desktop.

Select an option

Save iangow/812e9ab6c7813616bb4a61c2ff8881df to your computer and use it in GitHub Desktop.
Instructions for `db2pq`.

Steps to Creating a Parquet Data Repository from WRDS

This guide will help you download WRDS data and store it locally as Parquet files using the db2pq Python package. No administrator privileges are required for these steps. As a side-effect, after running the following, you will have a working installation of Python on your system.


Notes

  • Credentials may be stored in a .pgpass file for future use
  • No administrator privileges are required
  • Start small and work up to bigger files. If you are far from the WRDS server or have a slow internet connection, download times can be longer.

πŸ–₯️ Where to run these commands

Run commands in:

  • Terminal (macOS or Linux)
  • PowerShell (Windows)

1. Install uv

macOS and Linux

Open Terminal and run:

curl -LsSf https://astral.sh/uv/install.sh | sh

(If you don't have curl, you can use wget -qO- instead.)

After installation, restart Terminal or run:

source ~/.zshrc

Verify installation:

uv --version

Windows

  1. Open PowerShell:

    • Click Start Menu
    • Type: PowerShell
    • Click Windows PowerShell (or PowerShell)
  2. Paste and run:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
  1. Close PowerShell and open a new one.

  2. Verify installation:

uv --version

2. Create a script folder

Create a folder where you will store scripts.

Example:

mkdir ~/scripts
cd ~/scripts

3. Create a data directory

This should be a directory where you are comfortable storing a few gigabytes of data.

Example:

mkdir ~/data

⚠️ Some WRDS tables can be large (multiple GB). Ensure you have sufficient disk space for any tables you might want to download. Note that Parquet files are generally a fraction of the size of the original WRDS data (PostgreSQL tables or SAS files) due to compression.


4. Create a .env file

Create a file named .env (no extension) inside your script folder.

touch .env

5. Add configuration to .env

Edit the .env file and add:

WRDS_ID=your_wrds_id
DATA_DIR=~/data

Replace:

  • your_wrds_id with your WRDS login
  • ~/data with the location of the data directory you created above

Windows example:

DATA_DIR=C:\Users\YourName\data

6. Initialize project and install dependencies

From inside your script folder:

uv init
uv add db2pq

This creates a Python project and virtual environment in the current folder.


7. Run Python

Start a Python session:

uv run python

You should see some information about Python on your system followed by the Python prompt (>>>) where you can enter Python commands. Here's what I see:

Python 3.14.3 (main, Mar  3 2026, 15:46:55) [Clang 21.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

8. Download WRDS data

Suppose you want crsp.wrds_dailyindexret_query, where crsp is the PostgreSQL schema (SAS library) and wrds_dailyindexret_query is the table (this appears to be the "new" version of crsp.dsi).

In Python, run:

from db2pq import wrds_update_pq
wrds_update_pq("wrds_dailyindexret_query", "crsp")

You may be prompted to:

  • Enter your WRDS password (first time only)
  • Approve a 2FA request (e.g., on your phone)

9. Example output

Enter your WRDS PostgreSQL password:
Saved WRDS PostgreSQL credentials to /Users/username/.pgpass.
Updated crsp.wrds_dailyindexret_query is available.
Beginning file download at 2026-03-31 20:51:24 UTC.
Writing crsp.wrds_dailyindexret_query: 100.0% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– ETA <1s
Completed file download at 2026-03-31 20:51:26 UTC.
'/Users/igow/Dropbox/far_data/crsp/wrds_dailyindexret_query.parquet'

10. Result

You now have the crsp.wrds_dailyindexret_query table stored locally as a Parquet file.


11. Download other tables

You can replace "wrds_dailyindexret_query" and "crsp" in wrds_update_pq() with any table you have access to on WRDS. Even fairly large tables can be quite quick to download. For example, crsp.msf_v2 is 1.26 GB in WRDS PostgreSQL, but downloads in ~30 seconds for me. The resulting Parquet file is 207 MB.

>>> wrds_update_pq("msf_v2", "crsp")
Updated crsp.msf_v2 is available.
Beginning file download at 2026-03-30 15:14:33 UTC.
100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– (00:00:31.25 elapsed)     
Completed file download at 2026-03-30 15:15:06 UTC.
'/Users/igow/data/crsp/msf_v2.parquet'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment