baggiponte · December 20, 2024 11:33
diff --git a/bigquery-and-polars b/bigquery-and-polars
 # BigQuery Demo

 1. Create a new directory for your project.
 2. Open a terminal inside the directory.
 3. Run the following:

 ```bash
 uv init --python=3.12
 uv add google-cloud-bigquery pyarrow polars
 ```

 This will create a new project using Python 3.12 and install the necessary dependencies.

 4. Create a folder named `credentials` in the root of your project and put the `service-account.json` file there.

 5. If you need to do exploratory data analysis, you can install `jupyterlab` or `spyder` with:

 ```bash
 # if you want to use jupyter notebooks
 uv add --dev jupyterlab

 # if you want to use spyder
 uv add --dev spyder
 ```

 To launch them, do:

 ```bash
 # if you want to use jupyter notebooks
 uv run jupyter lab

 # if you want to use spyder
 uv run spyder
 ```

 6. Create a new file named `main.py` and put the following code in it:

 ```python
 from google.cloud import bigquery
 import polars as pl

 credentials = ... # put the path to the credentials here

 client = bigquery.Client.from_service_account_json(credentials)

 query = ...
 job = client.query(query)

 data = pl.from_arrow(job.to_arrow())
 ```

 # FAQ

 ## How to use the credentials?

 The path to the credentials should *not* be committed to the repository.

 If you are on Winwdows, the string to the credentials should be in the following format:

 ```python
 # note there are two backslashes
 credentials = "C:\\path\\to\\credentials\\service-account.json"

 # altenatively, you can use the following (note the small `r` before the quotes)
 credentials = r"C:\path\to\credentials\service-account.json"
 ```

 ## I get an error when converting the `job` to a Polars DataFrame

 The conversion from the `job` to a Polars DataFrame might fail. For example, the following code will raise this error:

 ```python
 query = "SELECT * FROM L2.F_CURVE_PREZZO_QUOT LIMIT 5"

 job = client.query(query)

 pl.from_arrow(job.to_arrow())

 # PanicException: operator does not support primitive `Int256`
 ```

 This happens because the `Int256` type is not supported by Polars. To fix this, you need to downcast the column to `Int64` inside the query:

 ```python
 query = """
  SELECT
    * EXCEPT(N_VALORE),
    CAST(N_VALORE AS NUMERIC) AS N_VALORE
  FROM L2.F_CURVE_PREZZO_QUOT
  LIMIT 5;
 """
 ```

 Tip: the error message does not tell you which column is causing the problem. You can still inspect the columns and their types
 if you just print the `job.to_arrow()` object:

 ```python
 job.to_arrow()
 ```

 In general, the types with precision of 256 might not be supported.
	# BigQuery Demo

	1. Create a new directory for your project.
	2. Open a terminal inside the directory.
	3. Run the following:

	```bash
	uv init --python=3.12
	uv add google-cloud-bigquery pyarrow polars
	```

	This will create a new project using Python 3.12 and install the necessary dependencies.

	4. Create a folder named `credentials` in the root of your project and put the `service-account.json` file there.

	5. If you need to do exploratory data analysis, you can install `jupyterlab` or `spyder` with:

	```bash
	# if you want to use jupyter notebooks
	uv add --dev jupyterlab

	# if you want to use spyder
	uv add --dev spyder
	```

	To launch them, do:

	```bash
	# if you want to use jupyter notebooks
	uv run jupyter lab

	# if you want to use spyder
	uv run spyder
	```

	6. Create a new file named `main.py` and put the following code in it:

	```python
	from google.cloud import bigquery
	import polars as pl

	credentials = ... # put the path to the credentials here

	client = bigquery.Client.from_service_account_json(credentials)

	query = ...
	job = client.query(query)

	data = pl.from_arrow(job.to_arrow())
	```

	# FAQ

	## How to use the credentials?

	The path to the credentials should not be committed to the repository.

	If you are on Winwdows, the string to the credentials should be in the following format:

	```python
	# note there are two backslashes
	credentials = "C:\\path\\to\\credentials\\service-account.json"

	# altenatively, you can use the following (note the small `r` before the quotes)
	credentials = r"C:\path\to\credentials\service-account.json"
	```

	## I get an error when converting the `job` to a Polars DataFrame

	The conversion from the `job` to a Polars DataFrame might fail. For example, the following code will raise this error:

	```python
	query = "SELECT * FROM L2.F_CURVE_PREZZO_QUOT LIMIT 5"

	job = client.query(query)

	pl.from_arrow(job.to_arrow())

	# PanicException: operator does not support primitive `Int256`
	```

	This happens because the `Int256` type is not supported by Polars. To fix this, you need to downcast the column to `Int64` inside the query:

	```python
	query = """
	SELECT
	* EXCEPT(N_VALORE),
	CAST(N_VALORE AS NUMERIC) AS N_VALORE
	FROM L2.F_CURVE_PREZZO_QUOT
	LIMIT 5;
	"""
	```

	Tip: the error message does not tell you which column is causing the problem. You can still inspect the columns and their types
	if you just print the `job.to_arrow()` object:

	```python
	job.to_arrow()
	```

	In general, the types with precision of 256 might not be supported.