Last active
December 20, 2024 11:33
-
-
Save baggiponte/fa3626658add4daef2ed002768a50f20 to your computer and use it in GitHub Desktop.
BigQuery + Polars Example
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# BigQuery Demo | |
1. Create a new directory for your project. | |
2. Open a terminal inside the directory. | |
3. Run the following: | |
```bash | |
uv init --python=3.12 | |
uv add google-cloud-bigquery pyarrow polars | |
``` | |
This will create a new project using Python 3.12 and install the necessary dependencies. | |
4. Create a folder named `credentials` in the root of your project and put the `service-account.json` file there. | |
5. If you need to do exploratory data analysis, you can install `jupyterlab` or `spyder` with: | |
```bash | |
# if you want to use jupyter notebooks | |
uv add --dev jupyterlab | |
# if you want to use spyder | |
uv add --dev spyder | |
``` | |
To launch them, do: | |
```bash | |
# if you want to use jupyter notebooks | |
uv run jupyter lab | |
# if you want to use spyder | |
uv run spyder | |
``` | |
6. Create a new file named `main.py` and put the following code in it: | |
```python | |
from google.cloud import bigquery | |
import polars as pl | |
credentials = ... # put the path to the credentials here | |
client = bigquery.Client.from_service_account_json(credentials) | |
query = ... | |
job = client.query(query) | |
data = pl.from_arrow(job.to_arrow()) | |
``` | |
# FAQ | |
## How to use the credentials? | |
The path to the credentials should *not* be committed to the repository. | |
If you are on Winwdows, the string to the credentials should be in the following format: | |
```python | |
# note there are two backslashes | |
credentials = "C:\\path\\to\\credentials\\service-account.json" | |
# altenatively, you can use the following (note the small `r` before the quotes) | |
credentials = r"C:\path\to\credentials\service-account.json" | |
``` | |
## I get an error when converting the `job` to a Polars DataFrame | |
The conversion from the `job` to a Polars DataFrame might fail. For example, the following code will raise this error: | |
```python | |
query = "SELECT * FROM L2.F_CURVE_PREZZO_QUOT LIMIT 5" | |
job = client.query(query) | |
pl.from_arrow(job.to_arrow()) | |
# PanicException: operator does not support primitive `Int256` | |
``` | |
This happens because the `Int256` type is not supported by Polars. To fix this, you need to downcast the column to `Int64` inside the query: | |
```python | |
query = """ | |
SELECT | |
* EXCEPT(N_VALORE), | |
CAST(N_VALORE AS NUMERIC) AS N_VALORE | |
FROM L2.F_CURVE_PREZZO_QUOT | |
LIMIT 5; | |
""" | |
``` | |
Tip: the error message does not tell you which column is causing the problem. You can still inspect the columns and their types | |
if you just print the `job.to_arrow()` object: | |
```python | |
job.to_arrow() | |
``` | |
In general, the types with precision of 256 might not be supported. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment