Skip to content

Instantly share code, notes, and snippets.

View ianmcook's full-sized avatar

Ian Cook ianmcook

View GitHub Profile
@ianmcook
ianmcook / games_data.md
Created September 25, 2025 00:49
How to create games data in several formats
@ianmcook
ianmcook / adbc_snowflake.R
Created September 18, 2025 16:09
Query Snowflake in R with ADBC (using driver manifests)
library(adbcdrivermanager)
drv <- adbc_driver("snowflake")
db <- adbc_database_init(
drv,
username="USER",
password="PASS",
adbc.snowflake.sql.account="ACCOUNT-IDENT",
adbc.snowflake.sql.warehouse="MY_WAREHOUSE",
@ianmcook
ianmcook / request-body.json
Created August 21, 2025 16:08
Use the Snowflake SQL REST API from a shell script with curl and jq to execute multiple queries and download the result partitions in Arrow format
{
"statement": "SELECT * FROM MYTABLEONE; SELECT * FROM MYTABLETWO",
"parameters": {
"MULTI_STATEMENT_COUNT": "2"
},
"resultSetMetaData": {
"format": "arrowv1"
},
"timeout": 60,
"database": "MYDATABASE",
@ianmcook
ianmcook / requirements.txt
Created May 26, 2025 19:41
requirements.txt for building ADBC docs
adbc_driver_flightsql
adbc_driver_manager
adbc_driver_postgresql
adbc_driver_sqlite
furo
numpydoc
pandas
polars
sphinx
sphinx-copybutton
@ianmcook
ianmcook / curl_arrow_pipe_python.md
Last active May 14, 2025 20:37
Pipe Arrow IPC stream from curl to Python

First start an HTTP server to serve Arrow IPC stream data. You can do this using one of the server examples in HTTP GET Arrow Data: Simple Examples or simply by starting a Python HTTP server in the same directory where you have an Arrow IPC stream file (named file.arrows in this example).

python -m http.server 8008

Download the attached Python script script.py. You might need to do chmod +x script.py to make it executable.

@ianmcook
ianmcook / write_arrow_ipc_formats.py
Created April 2, 2025 16:48
Create sample data and write it to two files in Arrow IPC stream format and file format
import pandas as pd
import pyarrow as pa
file_path = 'fruit.arrow'
stream_path = 'fruit.arrows'
df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange'],
'variety': ['gala', 'honeycrisp', 'fuji', 'navel', 'valencia', 'cara cara'],
'weight': [134.2 , 158.6, None, 142.1, 96.7, None]})
@ianmcook
ianmcook / request-body.json
Last active August 21, 2025 16:08
Use the Snowflake SQL REST API from a shell script with curl and jq to execute a query and download the result partitions in Arrow format
{
"statement": "SELECT * FROM MYTABLE",
"resultSetMetaData": {
"format": "arrowv1"
},
"timeout": 60,
"database": "MYDATABASE",
"schema": "MYSCHEMA",
"warehouse": "MYWAREHOUSE",
"role": "MYROLE"

Why IbisML?

This is a simple example demonstrating why you might want to use IbisML instead of just plain Ibis in an ML preprocessing pipeline.

Scenario

You are training an ML model that gets better accuracy when the floating point number columns in the training data are normalized (by subtracting the mean and dividing by the standard deviation). Your data contains multiple floating point columns.

To demonstrate this, we can use the iris flower dataset.

@ianmcook
ianmcook / ibis_union_different_column_order.py
Created August 21, 2024 16:02
Union two Ibis tables with columns in different orders
import ibis
import random
con = ibis.connect("duckdb://penguins.ddb")
con.create_table(
"penguins", ibis.examples.penguins.fetch().to_pyarrow(), overwrite = True
)
ibis.options.interactive = True
@ianmcook
ianmcook / maintain_row_order.md
Last active January 15, 2025 20:21
Examples demonstrating whether systems maintain row order

This is a set of examples demonstrating whether various Python and R dataframe libraries and OLAP query engines preserve (or do not preserve) the original order of the records in the data.

Example data

The examples all use this dataset describing the 28 times when a person walked on the moon:

year mission name minutes
1969 Apollo 11 Neil Armstrong 151
1969 Apollo 11 Buzz Aldrin 151