Skip to content

Instantly share code, notes, and snippets.

View ianmcook's full-sized avatar

Ian Cook ianmcook

View GitHub Profile
@ianmcook
ianmcook / substrait_pyarrow_dataset_expressions.py
Created August 29, 2023 21:39
Use Substrait expressions to filter and project PyArrow datasets
import tempfile
import pathlib
import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.parquet as pq
import pyarrow.dataset as ds
# create a small dataset for example purposes
@ianmcook
ianmcook / acero_sort.cpp
Created August 17, 2023 21:19
Sort an Arrow Table with Acero
#include <iostream>
#include <arrow/api.h>
#include <arrow/result.h>
#include <arrow/compute/api.h>
#include <arrow/compute/exec/exec_plan.h>
arrow::Status ExecutePlanAndCollectAsTable(
std::shared_ptr<arrow::compute::ExecPlan> plan,
std::shared_ptr<arrow::Schema> schema,
arrow::AsyncGenerator<std::optional<arrow::compute::ExecBatch>> sink_gen) {
@ianmcook
ianmcook / ibis_bigquery_github_nested.py
Created April 14, 2023 17:04
Ibis BigQuery github_nested example query
import google.auth
import ibis
from ibis import _
credentials, billing_project = google.auth.default()
conn = ibis.bigquery.connect(billing_project, 'bigquery-public-data.samples')
t = conn.table('github_nested')
expr = (
@ianmcook
ianmcook / ibis_snowflake_tpc-h_1.py
Last active April 12, 2023 18:07
Ibis Snowflake TPC-H Query 1
# before running:
# 1. install Ibis and its Snowflake backend: https://ibis-project.org/backends/Snowflake/
# 2. create and activate a Snowflake trial account
# 3. set environment variables SNOWSQL_USER, SNOWSQL_PWD, SNOWSQL_ACCOUNT
import os
import ibis
from ibis import _
ibis.options.interactive = True
@ianmcook
ianmcook / ibis_trino.py
Last active April 9, 2023 12:02
Simple Ibis Trino demo
# before running:
# 1. install Ibis and its Trino backend: https://ibis-project.org/backends/Trino/
# 2. pull and run the Trino docker container: https://trino.io/docs/current/installation/containers.html
import ibis
from ibis import _
# connect to Trino
conn = ibis.trino.connect(database='memory', schema='default')
@ianmcook
ianmcook / duckdb_ibis_example.py
Created January 24, 2023 18:01
Ibis + DuckDB example
# pip install 'ibis-framework[duckdb]'
import pandas as pd
import ibis
from ibis import _
# create a pandas DataFrame and write it to a Parquet file
df = pd.DataFrame(data={'repo': ['pandas', 'duckdb', 'ibis'],
'stars': [36622, 8074, 2336]})
df.to_parquet('repo_stars.parquet')
@ianmcook
ianmcook / clean_github_jira_ids.R
Last active October 26, 2022 21:26
Match Apache Arrow Jira user accounts with GitHub user accounts
# run this script second
library(dplyr)
df <- read.csv("dirty.csv")
agg <- df %>%
group_by(jira, github) %>%
summarise(n = n(), .groups = "keep") %>%
ungroup() %>%
@ianmcook
ianmcook / acero_execplan.cpp
Last active April 23, 2025 08:52
Create and execute an Acero ExecPlan
#include <iostream>
#include <arrow/api.h>
#include <arrow/result.h>
#include <arrow/compute/api.h>
#include <arrow/compute/exec/exec_plan.h>
arrow::Status ExecutePlanAndCollectAsTable(
std::shared_ptr<arrow::compute::ExecPlan> plan,
std::shared_ptr<arrow::Schema> schema,
arrow::AsyncGenerator<std::optional<arrow::compute::ExecBatch>> sink_gen) {
@ianmcook
ianmcook / create_and_print_arrow_table.cpp
Last active June 2, 2022 14:01
Create and print an Arrow Table in C++
#include <iostream>
#include <arrow/api.h>
#include <arrow/result.h>
#include <arrow/compute/api.h>
arrow::Status Execute() {
arrow::Int32Builder int_builder;
ARROW_RETURN_NOT_OK(int_builder.Append(1));
ARROW_RETURN_NOT_OK(int_builder.Append(2));
ARROW_RETURN_NOT_OK(int_builder.Append(3));
@ianmcook
ianmcook / enquo_helpers.R
Last active April 10, 2021 03:48
rlang::enquo() helpers for eager evaluation and idempotence
# enquo() helpers for eager evaluation and idempotence
# wrap eager() around enquo() to evaluate the quosure immediately in the calling
# environment *if* it can do so without error, otherwise return the quosure
eager <- function(quo) {
val <- try(eval_tidy(quo), silent = TRUE)
if (inherits(val, "try-error")) {
quo
} else {
val