Skip to content

Instantly share code, notes, and snippets.

@luisdelatorre012
Last active August 10, 2025 19:24
Show Gist options
  • Save luisdelatorre012/939e1ced7026f50272a2535e4f329fc8 to your computer and use it in GitHub Desktop.
Save luisdelatorre012/939e1ced7026f50272a2535e4f329fc8 to your computer and use it in GitHub Desktop.
Rust database advice

You:

I want to connect to a various databases using rust, send select queries, and return polars data frames.

I need to connect to vertica, db2, ms sql server, and postgres databases. how do you recommend that I do each of those? It can be a different method for each database type if necessary.


ChatGPT:

Great goal. Here’s the most practical path today to run SELECT queries from Rust and get back Polars DataFrames, per engine.

Postgres

Best option: use ConnectorX and have it return Polars directly. It’s fast (parallel, zero-copy where possible) and supported natively.

  • Crate: connectorx with features src_postgres,dst_polars.

  • Sketch:

    use connectorx::prelude::*;
    let mut conn = SourceConn::try_from("postgresql://user:pw@host:5432/db")?;
    let out = get_polars(&conn, None, &[CXQuery::from("SELECT * FROM t")])?;
    let df: polars::prelude::DataFrame = out;  // already a DataFrame
  • Docs show Postgres as a source and Polars as a destination (enable via features). Docs.rs

Alt: tokio-postgres/sqlx → build Series and DataFrame manually (more work, slower). Docs.rs+1

SQL Server (MS T-SQL)

Best option: ConnectorX again, features src_mssql,dst_polars. Same usage as above. Docs.rs

Alt: Tiberius (native TDS driver) to fetch rows, then build a DataFrame. Use if you need fine-grained control or features ConnectorX lacks. Docs.rs+1

Vertica

No mature native Rust driver; use ODBC → Arrow → Polars:

  1. Install and configure Vertica’s ODBC driver (create a DSN). docs.vertica.com+1

  2. In Rust, read query results as Arrow RecordBatches with arrow-odbc (built on odbc-api). Docs.rsCratesGitHub

  3. Convert Arrow (arrow2) arrays to Polars Series and then a DataFrame. Example patterns here (from DuckDB’s Arrow interface, but identical idea for ODBC): downcast Arrow arrays → Series::try_from((name, array.to_boxed()))DataFrame::new(vec![...]). Vikram Oberoi

IBM DB2

Same ODBC → Arrow → Polars pattern:

  1. Install IBM’s DB2 ODBC/CLI driver and configure with unixODBC (Linux/macOS) or ODBC Data Source Admin (Windows). IBM+1Docs.rs

  2. Use arrow-odbc or odbc-api to execute the query and stream Arrow batches. Docs.rs+1

  3. Convert Arrow (arrow2) to Polars as above. Vikram Oberoi
    Note: There’s a community ibm_db crate, but ODBC + Arrow tends to be the most portable. Crates

Why these picks?

  • ConnectorX: direct Polars destination, supports Postgres & SQL Server, and is designed for fast parallel loads. Docs.rs

  • ODBC + Arrow: the most reliable cross-vendor path for Vertica and DB2. arrow-odbc gives you Arrow batches; Polars uses Arrow2 internally and can be populated from Arrow arrays (pattern shown in article). Docs.rsVikram Oberoi

Minimal Cargo feature hints

  • Postgres/SQL Server:

    connectorx = { version = "0.4", features = ["src_postgres","src_mssql","dst_polars"] }
    polars = { version = "0.45", features = ["lazy"] }

    Docs.rs

  • Vertica/DB2 via ODBC:

    odbc-api = "6"
    arrow-odbc = "10"   # use the version matching your toolchain
    polars = "0.45"

    Docs.rs+1

Converting Arrow → Polars (pattern)

When using ODBC:

  • Read batches with arrow_odbc::OdbcReaderBuilder.

  • For each batch (often a StructArray of columns), downcast each Arrow array to its concrete type, build Series via Series::try_from(("col", array.to_boxed())), then DataFrame::new(series_vec). Example walkthrough here. Vikram Oberoi

If you want, I can scaffold a tiny crate with:

  • PgRunner / MssqlRunner using ConnectorX, and

  • a generic OdbcRunner that returns DataFrame from any DSN (works for Vertica & DB2).


You:

write me some rust code that will run a select query I have stored in query.sql and turn the results into a polars data frame. write code that does this for each of the database types I asked you about.


ChatGPT:

awesome—here are four self-contained Rust examples that all:

  • read your SQL from a local query.sql

  • run it against the specified database

  • return a polars::prelude::DataFrame

I split them by engine so you can copy/paste what you need. At the end there’s a single Cargo.toml that works for all four.


1) Postgres → Polars (via ConnectorX)

// src/pg_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;

pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    // ConnectorX will parallelize and directly return a Polars DataFrame.
    let mut source = SourceConn::try_from(conn_str)?;
    let df: DataFrame = get_polars(&source, None, &[CXQuery::from(sql)])?;
    Ok(df)
}

// tiny demo
fn main() -> Result<()> {
    let conn = std::env::var("POSTGRES_URL")
        .expect("set POSTGRES_URL, e.g. postgresql://user:pw@host:5432/db");
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

2) SQL Server (MS T-SQL) → Polars (via ConnectorX)

// src/mssql_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;

pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let mut source = SourceConn::try_from(conn_str)?;
    let df: DataFrame = get_polars(&source, None, &[CXQuery::from(sql)])?;
    Ok(df)
}

fn main() -> Result<()> {
    // Example: "mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT"
    let conn = std::env::var("MSSQL_URL").expect("set MSSQL_URL connection string");
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

3) Vertica → Polars (via ODBC → CSV → Polars)

For Vertica (and DB2), the most portable path in Rust is ODBC. To keep type-mapping simple and robust across drivers, the code below fetches rows as text (odbc-api’s TextRowSet), writes them to an in-memory CSV, then lets Polars infer types when reading the CSV.

Prereqs: install/configure the Vertica ODBC driver and set up a DSN (or provide a full ODBC connection string).

// src/vertica_to_polars.rs
use anyhow::{anyhow, Result};
use csv::Writer;
use odbc_api::{buffers::TextRowSet, ConnectionOptions, Environment};
use polars::prelude::*;
use std::{fs, io::Cursor};

pub fn run(query_path: &str, odbc_conn_str_or_dsn: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let env = Environment::new()?;

    // You can pass either: "DSN=VerticaDSN;UID=...;PWD=..." or just "DSN=VerticaDSN"
    let mut conn = env.connect_with_connection_string(odbc_conn_str_or_dsn, ConnectionOptions::default())?;

    // Execute and ensure we got a result set (SELECT).
    let mut cursor = conn
        .execute(&sql, ())
        .map_err(|e| anyhow!("ODBC execute failed: {e}"))?
        .ok_or_else(|| anyhow!("Query did not return a result set"))?;

    // Gather column names for the CSV header row.
    let num_cols = cursor.num_result_cols()? as usize;
    let mut col_names = Vec::with_capacity(num_cols);
    for i in 1..=num_cols {
        let desc = cursor.describe_col(i as u16)?;
        col_names.push(desc.name.to_string());
    }

    // TextRowSet fetches all values as UTF-8 text. Choose sensible batch size & max text length.
    let batch_size = 1024;
    let max_str_len = 16 * 1024; // 16 KiB per text cell (adjust if you have very wide text)
    let mut rowset = TextRowSet::for_cursor(batch_size, &mut cursor, Some(max_str_len))?;

    let mut out = Vec::<u8>::new();
    let mut wtr = Writer::from_writer(&mut out);
    wtr.write_record(&col_names)?;

    // Fetch batches and stream them into CSV
    while let Some(batch) = cursor.fetch(&mut rowset)? {
        for row_idx in 0..batch.num_rows() {
            let mut rec = Vec::with_capacity(num_cols);
            for col_idx in 0..num_cols {
                // Empty string for NULLs
                let cell = batch.at(row_idx, col_idx).unwrap_or("");
                rec.push(cell);
            }
            wtr.write_record(&rec)?;
        }
    }
    wtr.flush()?;

    // Now read the CSV into Polars (let it infer types).
    let reader = Cursor::new(out);
    let df = CsvReader::new(reader)
        .has_header(true)
        .infer_schema(Some(1000))
        .finish()?;

    Ok(df)
}

fn main() -> Result<()> {
    // Examples:
    //   DSN only: set VERTICA_ODBC="DSN=VerticaDSN"
    //   Full:     set VERTICA_ODBC="DSN=VerticaDSN;UID=dbuser;PWD=secret"
    let conn = std::env::var("VERTICA_ODBC").expect("set VERTICA_ODBC (ODBC conn string or DSN)");
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

4) IBM DB2 → Polars (via ODBC → CSV → Polars)

This is the same pattern as Vertica, just a different connection string (DB2’s ODBC/CLI driver).

// src/db2_to_polars.rs
use anyhow::{anyhow, Result};
use csv::Writer;
use odbc_api::{buffers::TextRowSet, ConnectionOptions, Environment};
use polars::prelude::*;
use std::{fs, io::Cursor};

pub fn run(query_path: &str, odbc_conn_str_or_dsn: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let env = Environment::new()?;
    let mut conn = env.connect_with_connection_string(odbc_conn_str_or_dsn, ConnectionOptions::default())?;

    let mut cursor = conn
        .execute(&sql, ())
        .map_err(|e| anyhow!("ODBC execute failed: {e}"))?
        .ok_or_else(|| anyhow!("Query did not return a result set"))?;

    let num_cols = cursor.num_result_cols()? as usize;
    let mut col_names = Vec::with_capacity(num_cols);
    for i in 1..=num_cols {
        let desc = cursor.describe_col(i as u16)?;
        col_names.push(desc.name.to_string());
    }

    let batch_size = 1024;
    let max_str_len = 16 * 1024;
    let mut rowset = TextRowSet::for_cursor(batch_size, &mut cursor, Some(max_str_len))?;

    let mut out = Vec::<u8>::new();
    let mut wtr = Writer::from_writer(&mut out);
    wtr.write_record(&col_names)?;

    while let Some(batch) = cursor.fetch(&mut rowset)? {
        for row_idx in 0..batch.num_rows() {
            let mut rec = Vec::with_capacity(num_cols);
            for col_idx in 0..num_cols {
                rec.push(batch.at(row_idx, col_idx).unwrap_or(""));
            }
            wtr.write_record(&rec)?;
        }
    }
    wtr.flush()?;

    let reader = Cursor::new(out);
    let df = CsvReader::new(reader)
        .has_header(true)
        .infer_schema(Some(1000))
        .finish()?;

    Ok(df)
}

fn main() -> Result<()> {
    // Examples:
    //   DSN only: set DB2_ODBC="DSN=DB2DSN"
    //   Full:     set DB2_ODBC="Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;"
    let conn = std::env::var("DB2_ODBC").expect("set DB2_ODBC (ODBC conn string or DSN)");
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

Cargo.toml (one file that covers all four)

[package]
name = "multi_db_to_polars"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"
polars = { version = "0.45", default-features = false, features = ["fmt", "csv"] }

# Postgres & SQL Server via ConnectorX → Polars
connectorx = { version = "0.4", default-features = false, features = [
  "src_postgres",
  "src_mssql",
  "dst_polars",
] }

# ODBC + CSV for Vertica & DB2
odbc-api = "6.0"
csv = "1.3"

How to run each

  • Postgres

    RUST_LOG=info POSTGRES_URL="postgresql://user:pw@host:5432/db" cargo run --bin pg_to_polars
    
  • SQL Server

    MSSQL_URL="mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT" cargo run --bin mssql_to_polars
    
  • Vertica

    # Example DSN set up via your ODBC manager
    VERTICA_ODBC="DSN=VerticaDSN;UID=dbuser;PWD=secret" cargo run --bin vertica_to_polars
    
  • DB2

    DB2_ODBC="Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;" \
    cargo run --bin db2_to_polars
    

Tip: keep the query.sql file next to your binary’s working directory, or pass a path instead of "query.sql".


Notes & options

  • If you prefer typed Arrow → Polars instead of the CSV hop for ODBC, you can swap in arrow-odbc and then build Series from Arrow arrays. The CSV approach above prioritizes portability and simple code.

  • For SQL Server, if you need granular control (MARS, bulk copy, DATEFIRST, etc.), you can consider tiberius. You’d then map rows → Series manually.

  • ConnectorX is fast; if you’re pulling large result sets from Postgres/SQL Server, it’ll likely be noticeably quicker than a row-by-row client.

Want me to bundle these into a small workspace (with a shared lib.rs) so you can cargo run -p pg, -p mssql, etc.?


You:

I don't like that it's inferring types from a csv. is there a way that I can keep track of whatever types the vertica or db2 query returns and use those instead of guessing from the csv?


ChatGPT:

totally—you don’t need to round-trip through CSV. You can keep the actual column types from Vertica/DB2 and still end up with a Polars DataFrame.

Below are two solid, typed paths for Vertica and DB2 (both via ODBC). Pick the one that matches your Polars version.


Option A (Polars ≤ 0.43-ish): convert Arrow → Polars zero-copy

arrow-odbc gives you Arrow **RecordBatch**es with a schema derived from the driver. You can convert each Arrow column into a Polars Series without guessing types.

use anyhow::Result;
use arrow::record_batch::RecordBatch;
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use polars::prelude::*;

/// Turn an Arrow RecordBatch into a Polars DataFrame (zero-copy).
fn record_batch_to_df(batch: &RecordBatch) -> PolarsResult<DataFrame> {
    let schema = batch.schema();
    let mut cols = Vec::with_capacity(batch.num_columns());
    for (i, col) in batch.columns().iter().enumerate() {
        // Convert arrow-rs ArrayRef -> polars-arrow Array (compat layer)
        let arr = Box::<dyn polars_arrow::array::Array>::from(&**col);
        cols.push(Series::from_arrow(schema.field(i).name().clone().into(), arr)?);
    }
    DataFrame::from_iter(cols)
}

pub fn run_typed(odbc_conn_str: &str, sql: &str) -> Result<DataFrame> {
    let env = Environment::new()?;
    let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
    let cursor = conn.execute(sql, ())?.expect("query must return rows");
    let mut reader = OdbcReaderBuilder::new().build(cursor)?; // yields Arrow RecordBatches

    // If multiple batches, vertically concatenate them in Polars
    let mut out: Option<DataFrame> = None;
    while let Some(batch) = reader.next() {
        let batch = batch?;
        let df = record_batch_to_df(&batch)?;
        out = Some(match out {
            None => df,
            Some(acc) => acc.vstack(&df)?,
        });
    }
    Ok(out.unwrap_or_else(DataFrame::default))
}
  • arrow-odbc builds Arrow RecordBatches and infers Arrow types from ODBC metadata (so you keep DECIMAL, TIMESTAMP, etc.). Docs.rs

  • The Series::from_arrow trick uses Polars’ polars-arrow compatibility layer to avoid copying. (This is exactly the pattern discussed here.) Stack Overflow

Cargo notes (for this option)

polars = { version = "0.43", default-features = false }
polars-arrow = { version = "0.43", features = ["arrow_rs"] }
arrow-odbc = "18"
anyhow = "1"

(Pin to a Polars that still exposes the arrow_rs conversion; see discussion about later changes.) GitHub


Option B (Polars ≥ 0.44/0.45+): use the Arrow C Data Interface (FFI)

Recent Polars removed direct arrow-rs glue, so the modern way is: export an Arrow RecordBatchReader to the C data interface and import that into Polars. This keeps the Arrow schema (types) intact and is still zero-copy.

Sketch (FFI idea):

use anyhow::Result;
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use arrow::ffi_stream::export_reader_into_raw; // export to FFI
use polars::prelude::*;
use polars_ffi::from_arrow_array_stream;       // import FFI stream into Polars

pub fn run_typed_via_ffi(odbc_conn_str: &str, sql: &str) -> Result<DataFrame> {
    let env = Environment::new()?;
    let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
    let cursor = conn.execute(sql, ())?.expect("query must return rows");

    // 1) Arrow reader from ODBC
    let reader = OdbcReaderBuilder::new().build(cursor)?;

    // 2) Export to Arrow C Stream
    let mut c_stream = std::mem::MaybeUninit::zeroed();
    unsafe { export_reader_into_raw(Box::new(reader), c_stream.as_mut_ptr())?; }
    let mut c_stream = unsafe { c_stream.assume_init() };

    // 3) Import C Stream into Polars (typed, zero-copy)
    //    polars-ffi provides the safe wrapper.
    let df = unsafe { from_arrow_array_stream(&mut c_stream)? };

    Ok(df)
}
  • The purge of arrow-rs in Polars means FFI is the stable bridge going forward. (See Polars PR + ConnectorX implementing the FFI path to Polars.) GitHub+1

  • FFI keeps your true DB types (Arrow schema) through to Polars.

Cargo notes (for this option)

polars = { version = "0.50", default-features = false, features = ["ipc"] } # ipc often enables Arrow interop bits
polars-ffi = "0.1"                   # the small helper crate that exposes FFI helpers
arrow = { version = "57", features = ["ffi", "ffi_stream"] }
arrow-odbc = "18"
anyhow = "1"

(Names/versions may drift; the key idea is Arrow’s ffi_stream + a Polars FFI import. ConnectorX also demonstrates this approach in practice.) GitHub


How to use for Vertica and DB2

  • Vertica: install Vertica’s ODBC driver, create a DSN, then pass either a DSN or a full ODBC connection string to run_typed*. Example:
    DSN=VerticaDSN;UID=dbuser;PWD=secret;
    The query types are read from Vertica via ODBC and preserved in Arrow → Polars. Docs.rs

  • DB2: install IBM DB2 ODBC/CLI, configure DSN, then use something like:
    Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;
    Same flow; Arrow types reflect DB2’s metadata. Crates.io


TL;DR

Yes—ditch the CSV hop. Use arrow-odbc to get **Arrow RecordBatch**es (typed), then:

  • Older Polars: convert Arrow arrays → Series via polars-arrow (no copies). Stack Overflow

  • Newer Polars: FFI bridge (Arrow C Stream) into Polars (no copies, future-proof). GitHub+1

If you tell me which Polars version you’re on, I’ll drop in a ready-to-build Cargo.toml + a single run(odbc_conn, query_path) helper you can reuse for both Vertica and DB2.


You:

whatever is the newest version of polars.


ChatGPT:

gotcha—let’s keep real DB types, no guessing. Below are drop-in Rust snippets for each engine using the newest Polars. For Vertica/DB2 I use ODBC → Arrow RecordBatch → Arrow IPC stream → Polars; IPC preserves the Arrow schema/types end-to-end. Postgres/SQL Server still use ConnectorX which produces a Polars DataFrame directly.


Postgres → Polars (ConnectorX)

// src/pg_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;

pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let mut src = SourceConn::try_from(conn_str)?;
    // ConnectorX returns a Polars DataFrame; types come from Postgres.
    let df: DataFrame = get_polars(&src, None, &[CXQuery::from(sql)])?;
    Ok(df)
}

fn main() -> Result<()> {
    let url = std::env::var("POSTGRES_URL")?;
    let df = run("query.sql", &url)?;
    println!("{df}");
    Ok(())
}

SQL Server → Polars (ConnectorX)

// src/mssql_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;

pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let mut src = SourceConn::try_from(conn_str)?;
    let df: DataFrame = get_polars(&src, None, &[CXQuery::from(sql)])?;
    Ok(df)
}

fn main() -> Result<()> {
    // e.g. "mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT"
    let url = std::env::var("MSSQL_URL")?;
    let df = run("query.sql", &url)?;
    println!("{df}");
    Ok(())
}

Vertica → Polars (ODBC → Arrow → IPC → Polars)

// src/vertica_to_polars.rs
use anyhow::{anyhow, Result};
use arrow_ipc::writer::StreamWriter;
use odbc_api::{ConnectionOptions, Environment};
use polars::prelude::*;
use polars_io::ipc::IpcStreamReader;
use polars_io::SerReader;
use std::{fs, io::Cursor};

pub fn run(query_path: &str, odbc_conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let env = Environment::new()?;
    let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
    let cursor = conn
        .execute(&sql, ())
        .map_err(|e| anyhow!("ODBC execute failed: {e}"))?
        .ok_or_else(|| anyhow!("Query did not return rows"))?;

    // arrow-odbc gives a RecordBatch reader directly
    let mut reader = arrow_odbc::OdbcReaderBuilder::new().build(cursor)?;

    // Write Arrow record batches to an in-memory Arrow IPC *stream* (keeps types)
    let mut buf = Vec::<u8>::new();
    {
        let mut writer = StreamWriter::try_new(&mut buf, reader.schema())?;
        while let Some(batch) = reader.next() {
            writer.write(&batch?)?;
        }
        writer.finish()?;
    }

    // Read IPC stream into Polars DataFrame (typed; no CSV inference)
    let df = IpcStreamReader::new(Cursor::new(buf)).finish()?;
    Ok(df)
}

fn main() -> Result<()> {
    // Example: "DSN=VerticaDSN;UID=dbuser;PWD=secret"
    let conn = std::env::var("VERTICA_ODBC")?;
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

IBM DB2 → Polars (ODBC → Arrow → IPC → Polars)

// src/db2_to_polars.rs
use anyhow::{anyhow, Result};
use arrow_ipc::writer::StreamWriter;
use odbc_api::{ConnectionOptions, Environment};
use polars::prelude::*;
use polars_io::ipc::IpcStreamReader;
use polars_io::SerReader;
use std::{fs, io::Cursor};

pub fn run(query_path: &str, odbc_conn_str: &str) -> Result<DataFrame> {
    let sql = fs::read_to_string(query_path)?;
    let env = Environment::new()?;
    let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
    let cursor = conn
        .execute(&sql, ())
        .map_err(|e| anyhow!("ODBC execute failed: {e}"))?
        .ok_or_else(|| anyhow!("Query did not return rows"))?;

    let mut reader = arrow_odbc::OdbcReaderBuilder::new().build(cursor)?;

    let mut buf = Vec::<u8>::new();
    {
        let mut writer = StreamWriter::try_new(&mut buf, reader.schema())?;
        while let Some(batch) = reader.next() {
            writer.write(&batch?)?;
        }
        writer.finish()?;
    }

    let df = IpcStreamReader::new(Cursor::new(buf)).finish()?;
    Ok(df)
}

fn main() -> Result<()> {
    // Example full string:
    // "Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;"
    let conn = std::env::var("DB2_ODBC")?;
    let df = run("query.sql", &conn)?;
    println!("{df}");
    Ok(())
}

Cargo.toml (works with current Polars + IPC path)

[package]
name = "db_to_polars"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1"

# Polars + IO (IPC reader)
polars = { version = "0.50", default-features = false, features = ["fmt"] }
polars-io = { version = "0.50", features = ["ipc"] }

# Postgres & SQL Server via ConnectorX → Polars
connectorx = { version = "0.4", default-features = false, features = [
  "src_postgres",
  "src_mssql",
  "dst_polars",
] }

# ODBC + Arrow + IPC for Vertica/DB2
odbc-api = "6"
arrow-odbc = "18"
arrow-ipc = "57"

Why this is “typed”:

  • arrow-odbc yields Arrow RecordBatches using driver metadata—no text casts. Docs.rsCrates.io

  • We serialize those batches to Arrow IPC stream in memory and read them with Polars’ IPC reader, preserving the Arrow schema end-to-end (still zero schema guessing). Docs.rsdocs.pola.rs

If you later want true zero-copy (skip IPC bytes) we can switch to the Arrow C Data Interface (FFI) path and import the Arrow stream directly into Polars (same types, minimal overhead)—I can wire that up too. Docs.rs+1

Want me to bundle these into a tiny workspace with four binaries and a shared read_sql("query.sql") helper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment