Skip to content

Instantly share code, notes, and snippets.

@baggiponte
Last active December 20, 2024 11:33
Show Gist options
  • Save baggiponte/fa3626658add4daef2ed002768a50f20 to your computer and use it in GitHub Desktop.
Save baggiponte/fa3626658add4daef2ed002768a50f20 to your computer and use it in GitHub Desktop.
BigQuery + Polars Example
# BigQuery Demo
1. Create a new directory for your project.
2. Open a terminal inside the directory.
3. Run the following:
```bash
uv init --python=3.12
uv add google-cloud-bigquery pyarrow polars
```
This will create a new project using Python 3.12 and install the necessary dependencies.
4. Create a folder named `credentials` in the root of your project and put the `service-account.json` file there.
5. If you need to do exploratory data analysis, you can install `jupyterlab` or `spyder` with:
```bash
# if you want to use jupyter notebooks
uv add --dev jupyterlab
# if you want to use spyder
uv add --dev spyder
```
To launch them, do:
```bash
# if you want to use jupyter notebooks
uv run jupyter lab
# if you want to use spyder
uv run spyder
```
6. Create a new file named `main.py` and put the following code in it:
```python
from google.cloud import bigquery
import polars as pl
credentials = ... # put the path to the credentials here
client = bigquery.Client.from_service_account_json(credentials)
query = ...
job = client.query(query)
data = pl.from_arrow(job.to_arrow())
```
# FAQ
## How to use the credentials?
The path to the credentials should *not* be committed to the repository.
If you are on Winwdows, the string to the credentials should be in the following format:
```python
# note there are two backslashes
credentials = "C:\\path\\to\\credentials\\service-account.json"
# altenatively, you can use the following (note the small `r` before the quotes)
credentials = r"C:\path\to\credentials\service-account.json"
```
## I get an error when converting the `job` to a Polars DataFrame
The conversion from the `job` to a Polars DataFrame might fail. For example, the following code will raise this error:
```python
query = "SELECT * FROM L2.F_CURVE_PREZZO_QUOT LIMIT 5"
job = client.query(query)
pl.from_arrow(job.to_arrow())
# PanicException: operator does not support primitive `Int256`
```
This happens because the `Int256` type is not supported by Polars. To fix this, you need to downcast the column to `Int64` inside the query:
```python
query = """
SELECT
* EXCEPT(N_VALORE),
CAST(N_VALORE AS NUMERIC) AS N_VALORE
FROM L2.F_CURVE_PREZZO_QUOT
LIMIT 5;
"""
```
Tip: the error message does not tell you which column is causing the problem. You can still inspect the columns and their types
if you just print the `job.to_arrow()` object:
```python
job.to_arrow()
```
In general, the types with precision of 256 might not be supported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment