You:
I want to connect to a various databases using rust, send select queries, and return polars data frames.
I need to connect to vertica, db2, ms sql server, and postgres databases. how do you recommend that I do each of those? It can be a different method for each database type if necessary.
ChatGPT:
Great goal. Here’s the most practical path today to run SELECT queries from Rust and get back Polars DataFrames, per engine.
Best option: use ConnectorX and have it return Polars directly. It’s fast (parallel, zero-copy where possible) and supported natively.
-
Crate:
connectorxwith featuressrc_postgres,dst_polars. -
Sketch:
use connectorx::prelude::*; let mut conn = SourceConn::try_from("postgresql://user:pw@host:5432/db")?; let out = get_polars(&conn, None, &[CXQuery::from("SELECT * FROM t")])?; let df: polars::prelude::DataFrame = out; // already a DataFrame
-
Docs show Postgres as a source and Polars as a destination (enable via features). Docs.rs
Alt: tokio-postgres/sqlx → build Series and DataFrame manually (more work, slower). Docs.rs+1
Best option: ConnectorX again, features src_mssql,dst_polars. Same usage as above. Docs.rs
Alt: Tiberius (native TDS driver) to fetch rows, then build a DataFrame. Use if you need fine-grained control or features ConnectorX lacks. Docs.rs+1
No mature native Rust driver; use ODBC → Arrow → Polars:
-
Install and configure Vertica’s ODBC driver (create a DSN). docs.vertica.com+1
-
In Rust, read query results as Arrow RecordBatches with
arrow-odbc(built onodbc-api). Docs.rsCratesGitHub -
Convert Arrow (arrow2) arrays to Polars
Seriesand then aDataFrame. Example patterns here (from DuckDB’s Arrow interface, but identical idea for ODBC): downcast Arrow arrays →Series::try_from((name, array.to_boxed()))→DataFrame::new(vec![...]). Vikram Oberoi
Same ODBC → Arrow → Polars pattern:
-
Install IBM’s DB2 ODBC/CLI driver and configure with
unixODBC(Linux/macOS) or ODBC Data Source Admin (Windows). IBM+1Docs.rs -
Use
arrow-odbcorodbc-apito execute the query and stream Arrow batches. Docs.rs+1 -
Convert Arrow (arrow2) to Polars as above. Vikram Oberoi
Note: There’s a communityibm_dbcrate, but ODBC + Arrow tends to be the most portable. Crates
-
ConnectorX: direct Polars destination, supports Postgres & SQL Server, and is designed for fast parallel loads. Docs.rs
-
ODBC + Arrow: the most reliable cross-vendor path for Vertica and DB2.
arrow-odbcgives you Arrow batches; Polars uses Arrow2 internally and can be populated from Arrow arrays (pattern shown in article). Docs.rsVikram Oberoi
-
Postgres/SQL Server:
connectorx = { version = "0.4", features = ["src_postgres","src_mssql","dst_polars"] } polars = { version = "0.45", features = ["lazy"] }
-
Vertica/DB2 via ODBC:
odbc-api = "6" arrow-odbc = "10" # use the version matching your toolchain polars = "0.45"
When using ODBC:
-
Read batches with
arrow_odbc::OdbcReaderBuilder. -
For each batch (often a
StructArrayof columns), downcast each Arrow array to its concrete type, buildSeriesviaSeries::try_from(("col", array.to_boxed())), thenDataFrame::new(series_vec). Example walkthrough here. Vikram Oberoi
If you want, I can scaffold a tiny crate with:
-
PgRunner/MssqlRunnerusing ConnectorX, and -
a generic
OdbcRunnerthat returnsDataFramefrom any DSN (works for Vertica & DB2).
You:
write me some rust code that will run a select query I have stored in query.sql and turn the results into a polars data frame. write code that does this for each of the database types I asked you about.
ChatGPT:
awesome—here are four self-contained Rust examples that all:
-
read your SQL from a local
query.sql -
run it against the specified database
-
return a
polars::prelude::DataFrame
I split them by engine so you can copy/paste what you need. At the end there’s a single Cargo.toml that works for all four.
// src/pg_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;
pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
// ConnectorX will parallelize and directly return a Polars DataFrame.
let mut source = SourceConn::try_from(conn_str)?;
let df: DataFrame = get_polars(&source, None, &[CXQuery::from(sql)])?;
Ok(df)
}
// tiny demo
fn main() -> Result<()> {
let conn = std::env::var("POSTGRES_URL")
.expect("set POSTGRES_URL, e.g. postgresql://user:pw@host:5432/db");
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}// src/mssql_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;
pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let mut source = SourceConn::try_from(conn_str)?;
let df: DataFrame = get_polars(&source, None, &[CXQuery::from(sql)])?;
Ok(df)
}
fn main() -> Result<()> {
// Example: "mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT"
let conn = std::env::var("MSSQL_URL").expect("set MSSQL_URL connection string");
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}For Vertica (and DB2), the most portable path in Rust is ODBC. To keep type-mapping simple and robust across drivers, the code below fetches rows as text (odbc-api’s TextRowSet), writes them to an in-memory CSV, then lets Polars infer types when reading the CSV.
Prereqs: install/configure the Vertica ODBC driver and set up a DSN (or provide a full ODBC connection string).
// src/vertica_to_polars.rs
use anyhow::{anyhow, Result};
use csv::Writer;
use odbc_api::{buffers::TextRowSet, ConnectionOptions, Environment};
use polars::prelude::*;
use std::{fs, io::Cursor};
pub fn run(query_path: &str, odbc_conn_str_or_dsn: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let env = Environment::new()?;
// You can pass either: "DSN=VerticaDSN;UID=...;PWD=..." or just "DSN=VerticaDSN"
let mut conn = env.connect_with_connection_string(odbc_conn_str_or_dsn, ConnectionOptions::default())?;
// Execute and ensure we got a result set (SELECT).
let mut cursor = conn
.execute(&sql, ())
.map_err(|e| anyhow!("ODBC execute failed: {e}"))?
.ok_or_else(|| anyhow!("Query did not return a result set"))?;
// Gather column names for the CSV header row.
let num_cols = cursor.num_result_cols()? as usize;
let mut col_names = Vec::with_capacity(num_cols);
for i in 1..=num_cols {
let desc = cursor.describe_col(i as u16)?;
col_names.push(desc.name.to_string());
}
// TextRowSet fetches all values as UTF-8 text. Choose sensible batch size & max text length.
let batch_size = 1024;
let max_str_len = 16 * 1024; // 16 KiB per text cell (adjust if you have very wide text)
let mut rowset = TextRowSet::for_cursor(batch_size, &mut cursor, Some(max_str_len))?;
let mut out = Vec::<u8>::new();
let mut wtr = Writer::from_writer(&mut out);
wtr.write_record(&col_names)?;
// Fetch batches and stream them into CSV
while let Some(batch) = cursor.fetch(&mut rowset)? {
for row_idx in 0..batch.num_rows() {
let mut rec = Vec::with_capacity(num_cols);
for col_idx in 0..num_cols {
// Empty string for NULLs
let cell = batch.at(row_idx, col_idx).unwrap_or("");
rec.push(cell);
}
wtr.write_record(&rec)?;
}
}
wtr.flush()?;
// Now read the CSV into Polars (let it infer types).
let reader = Cursor::new(out);
let df = CsvReader::new(reader)
.has_header(true)
.infer_schema(Some(1000))
.finish()?;
Ok(df)
}
fn main() -> Result<()> {
// Examples:
// DSN only: set VERTICA_ODBC="DSN=VerticaDSN"
// Full: set VERTICA_ODBC="DSN=VerticaDSN;UID=dbuser;PWD=secret"
let conn = std::env::var("VERTICA_ODBC").expect("set VERTICA_ODBC (ODBC conn string or DSN)");
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}This is the same pattern as Vertica, just a different connection string (DB2’s ODBC/CLI driver).
// src/db2_to_polars.rs
use anyhow::{anyhow, Result};
use csv::Writer;
use odbc_api::{buffers::TextRowSet, ConnectionOptions, Environment};
use polars::prelude::*;
use std::{fs, io::Cursor};
pub fn run(query_path: &str, odbc_conn_str_or_dsn: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let env = Environment::new()?;
let mut conn = env.connect_with_connection_string(odbc_conn_str_or_dsn, ConnectionOptions::default())?;
let mut cursor = conn
.execute(&sql, ())
.map_err(|e| anyhow!("ODBC execute failed: {e}"))?
.ok_or_else(|| anyhow!("Query did not return a result set"))?;
let num_cols = cursor.num_result_cols()? as usize;
let mut col_names = Vec::with_capacity(num_cols);
for i in 1..=num_cols {
let desc = cursor.describe_col(i as u16)?;
col_names.push(desc.name.to_string());
}
let batch_size = 1024;
let max_str_len = 16 * 1024;
let mut rowset = TextRowSet::for_cursor(batch_size, &mut cursor, Some(max_str_len))?;
let mut out = Vec::<u8>::new();
let mut wtr = Writer::from_writer(&mut out);
wtr.write_record(&col_names)?;
while let Some(batch) = cursor.fetch(&mut rowset)? {
for row_idx in 0..batch.num_rows() {
let mut rec = Vec::with_capacity(num_cols);
for col_idx in 0..num_cols {
rec.push(batch.at(row_idx, col_idx).unwrap_or(""));
}
wtr.write_record(&rec)?;
}
}
wtr.flush()?;
let reader = Cursor::new(out);
let df = CsvReader::new(reader)
.has_header(true)
.infer_schema(Some(1000))
.finish()?;
Ok(df)
}
fn main() -> Result<()> {
// Examples:
// DSN only: set DB2_ODBC="DSN=DB2DSN"
// Full: set DB2_ODBC="Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;"
let conn = std::env::var("DB2_ODBC").expect("set DB2_ODBC (ODBC conn string or DSN)");
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}[package]
name = "multi_db_to_polars"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1.0"
polars = { version = "0.45", default-features = false, features = ["fmt", "csv"] }
# Postgres & SQL Server via ConnectorX → Polars
connectorx = { version = "0.4", default-features = false, features = [
"src_postgres",
"src_mssql",
"dst_polars",
] }
# ODBC + CSV for Vertica & DB2
odbc-api = "6.0"
csv = "1.3"-
Postgres
RUST_LOG=info POSTGRES_URL="postgresql://user:pw@host:5432/db" cargo run --bin pg_to_polars -
SQL Server
MSSQL_URL="mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT" cargo run --bin mssql_to_polars -
Vertica
# Example DSN set up via your ODBC manager VERTICA_ODBC="DSN=VerticaDSN;UID=dbuser;PWD=secret" cargo run --bin vertica_to_polars -
DB2
DB2_ODBC="Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;" \ cargo run --bin db2_to_polars
Tip: keep the
query.sqlfile next to your binary’s working directory, or pass a path instead of"query.sql".
-
If you prefer typed Arrow → Polars instead of the CSV hop for ODBC, you can swap in
arrow-odbcand then buildSeriesfrom Arrow arrays. The CSV approach above prioritizes portability and simple code. -
For SQL Server, if you need granular control (MARS, bulk copy,
DATEFIRST, etc.), you can considertiberius. You’d then map rows →Seriesmanually. -
ConnectorX is fast; if you’re pulling large result sets from Postgres/SQL Server, it’ll likely be noticeably quicker than a row-by-row client.
Want me to bundle these into a small workspace (with a shared lib.rs) so you can cargo run -p pg, -p mssql, etc.?
You:
I don't like that it's inferring types from a csv. is there a way that I can keep track of whatever types the vertica or db2 query returns and use those instead of guessing from the csv?
ChatGPT:
totally—you don’t need to round-trip through CSV. You can keep the actual column types from Vertica/DB2 and still end up with a Polars DataFrame.
Below are two solid, typed paths for Vertica and DB2 (both via ODBC). Pick the one that matches your Polars version.
arrow-odbc gives you Arrow **RecordBatch**es with a schema derived from the driver. You can convert each Arrow column into a Polars Series without guessing types.
use anyhow::Result;
use arrow::record_batch::RecordBatch;
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use polars::prelude::*;
/// Turn an Arrow RecordBatch into a Polars DataFrame (zero-copy).
fn record_batch_to_df(batch: &RecordBatch) -> PolarsResult<DataFrame> {
let schema = batch.schema();
let mut cols = Vec::with_capacity(batch.num_columns());
for (i, col) in batch.columns().iter().enumerate() {
// Convert arrow-rs ArrayRef -> polars-arrow Array (compat layer)
let arr = Box::<dyn polars_arrow::array::Array>::from(&**col);
cols.push(Series::from_arrow(schema.field(i).name().clone().into(), arr)?);
}
DataFrame::from_iter(cols)
}
pub fn run_typed(odbc_conn_str: &str, sql: &str) -> Result<DataFrame> {
let env = Environment::new()?;
let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
let cursor = conn.execute(sql, ())?.expect("query must return rows");
let mut reader = OdbcReaderBuilder::new().build(cursor)?; // yields Arrow RecordBatches
// If multiple batches, vertically concatenate them in Polars
let mut out: Option<DataFrame> = None;
while let Some(batch) = reader.next() {
let batch = batch?;
let df = record_batch_to_df(&batch)?;
out = Some(match out {
None => df,
Some(acc) => acc.vstack(&df)?,
});
}
Ok(out.unwrap_or_else(DataFrame::default))
}-
arrow-odbcbuilds ArrowRecordBatches and infers Arrow types from ODBC metadata (so you keep DECIMAL, TIMESTAMP, etc.). Docs.rs -
The
Series::from_arrowtrick uses Polars’ polars-arrow compatibility layer to avoid copying. (This is exactly the pattern discussed here.) Stack Overflow
Cargo notes (for this option)
polars = { version = "0.43", default-features = false } polars-arrow = { version = "0.43", features = ["arrow_rs"] } arrow-odbc = "18" anyhow = "1"(Pin to a Polars that still exposes the
arrow_rsconversion; see discussion about later changes.) GitHub
Recent Polars removed direct arrow-rs glue, so the modern way is: export an Arrow RecordBatchReader to the C data interface and import that into Polars. This keeps the Arrow schema (types) intact and is still zero-copy.
Sketch (FFI idea):
use anyhow::Result;
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use arrow::ffi_stream::export_reader_into_raw; // export to FFI
use polars::prelude::*;
use polars_ffi::from_arrow_array_stream; // import FFI stream into Polars
pub fn run_typed_via_ffi(odbc_conn_str: &str, sql: &str) -> Result<DataFrame> {
let env = Environment::new()?;
let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
let cursor = conn.execute(sql, ())?.expect("query must return rows");
// 1) Arrow reader from ODBC
let reader = OdbcReaderBuilder::new().build(cursor)?;
// 2) Export to Arrow C Stream
let mut c_stream = std::mem::MaybeUninit::zeroed();
unsafe { export_reader_into_raw(Box::new(reader), c_stream.as_mut_ptr())?; }
let mut c_stream = unsafe { c_stream.assume_init() };
// 3) Import C Stream into Polars (typed, zero-copy)
// polars-ffi provides the safe wrapper.
let df = unsafe { from_arrow_array_stream(&mut c_stream)? };
Ok(df)
}-
The purge of
arrow-rsin Polars means FFI is the stable bridge going forward. (See Polars PR + ConnectorX implementing the FFI path to Polars.) GitHub+1 -
FFI keeps your true DB types (Arrow schema) through to Polars.
Cargo notes (for this option)
polars = { version = "0.50", default-features = false, features = ["ipc"] } # ipc often enables Arrow interop bits polars-ffi = "0.1" # the small helper crate that exposes FFI helpers arrow = { version = "57", features = ["ffi", "ffi_stream"] } arrow-odbc = "18" anyhow = "1"(Names/versions may drift; the key idea is Arrow’s
ffi_stream+ a Polars FFI import. ConnectorX also demonstrates this approach in practice.) GitHub
-
Vertica: install Vertica’s ODBC driver, create a DSN, then pass either a DSN or a full ODBC connection string to
run_typed*. Example:
DSN=VerticaDSN;UID=dbuser;PWD=secret;
The query types are read from Vertica via ODBC and preserved in Arrow → Polars. Docs.rs -
DB2: install IBM DB2 ODBC/CLI, configure DSN, then use something like:
Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;
Same flow; Arrow types reflect DB2’s metadata. Crates.io
Yes—ditch the CSV hop. Use arrow-odbc to get **Arrow RecordBatch**es (typed), then:
-
Older Polars: convert Arrow arrays →
Seriesviapolars-arrow(no copies). Stack Overflow -
Newer Polars: FFI bridge (Arrow C Stream) into Polars (no copies, future-proof). GitHub+1
If you tell me which Polars version you’re on, I’ll drop in a ready-to-build Cargo.toml + a single run(odbc_conn, query_path) helper you can reuse for both Vertica and DB2.
You:
whatever is the newest version of polars.
ChatGPT:
gotcha—let’s keep real DB types, no guessing. Below are drop-in Rust snippets for each engine using the newest Polars. For Vertica/DB2 I use ODBC → Arrow RecordBatch → Arrow IPC stream → Polars; IPC preserves the Arrow schema/types end-to-end. Postgres/SQL Server still use ConnectorX which produces a Polars DataFrame directly.
// src/pg_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;
pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let mut src = SourceConn::try_from(conn_str)?;
// ConnectorX returns a Polars DataFrame; types come from Postgres.
let df: DataFrame = get_polars(&src, None, &[CXQuery::from(sql)])?;
Ok(df)
}
fn main() -> Result<()> {
let url = std::env::var("POSTGRES_URL")?;
let df = run("query.sql", &url)?;
println!("{df}");
Ok(())
}// src/mssql_to_polars.rs
use anyhow::Result;
use connectorx::prelude::*;
use polars::prelude::*;
use std::fs;
pub fn run(query_path: &str, conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let mut src = SourceConn::try_from(conn_str)?;
let df: DataFrame = get_polars(&src, None, &[CXQuery::from(sql)])?;
Ok(df)
}
fn main() -> Result<()> {
// e.g. "mssql://user:pw@host:1433/db?encrypt=DANGER_PLAINTEXT"
let url = std::env::var("MSSQL_URL")?;
let df = run("query.sql", &url)?;
println!("{df}");
Ok(())
}// src/vertica_to_polars.rs
use anyhow::{anyhow, Result};
use arrow_ipc::writer::StreamWriter;
use odbc_api::{ConnectionOptions, Environment};
use polars::prelude::*;
use polars_io::ipc::IpcStreamReader;
use polars_io::SerReader;
use std::{fs, io::Cursor};
pub fn run(query_path: &str, odbc_conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let env = Environment::new()?;
let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
let cursor = conn
.execute(&sql, ())
.map_err(|e| anyhow!("ODBC execute failed: {e}"))?
.ok_or_else(|| anyhow!("Query did not return rows"))?;
// arrow-odbc gives a RecordBatch reader directly
let mut reader = arrow_odbc::OdbcReaderBuilder::new().build(cursor)?;
// Write Arrow record batches to an in-memory Arrow IPC *stream* (keeps types)
let mut buf = Vec::<u8>::new();
{
let mut writer = StreamWriter::try_new(&mut buf, reader.schema())?;
while let Some(batch) = reader.next() {
writer.write(&batch?)?;
}
writer.finish()?;
}
// Read IPC stream into Polars DataFrame (typed; no CSV inference)
let df = IpcStreamReader::new(Cursor::new(buf)).finish()?;
Ok(df)
}
fn main() -> Result<()> {
// Example: "DSN=VerticaDSN;UID=dbuser;PWD=secret"
let conn = std::env::var("VERTICA_ODBC")?;
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}// src/db2_to_polars.rs
use anyhow::{anyhow, Result};
use arrow_ipc::writer::StreamWriter;
use odbc_api::{ConnectionOptions, Environment};
use polars::prelude::*;
use polars_io::ipc::IpcStreamReader;
use polars_io::SerReader;
use std::{fs, io::Cursor};
pub fn run(query_path: &str, odbc_conn_str: &str) -> Result<DataFrame> {
let sql = fs::read_to_string(query_path)?;
let env = Environment::new()?;
let conn = env.connect_with_connection_string(odbc_conn_str, ConnectionOptions::default())?;
let cursor = conn
.execute(&sql, ())
.map_err(|e| anyhow!("ODBC execute failed: {e}"))?
.ok_or_else(|| anyhow!("Query did not return rows"))?;
let mut reader = arrow_odbc::OdbcReaderBuilder::new().build(cursor)?;
let mut buf = Vec::<u8>::new();
{
let mut writer = StreamWriter::try_new(&mut buf, reader.schema())?;
while let Some(batch) = reader.next() {
writer.write(&batch?)?;
}
writer.finish()?;
}
let df = IpcStreamReader::new(Cursor::new(buf)).finish()?;
Ok(df)
}
fn main() -> Result<()> {
// Example full string:
// "Driver={IBM DB2 ODBC DRIVER};Database=MYDB;Hostname=host;Port=50000;Protocol=TCPIP;Uid=user;Pwd=pw;"
let conn = std::env::var("DB2_ODBC")?;
let df = run("query.sql", &conn)?;
println!("{df}");
Ok(())
}[package]
name = "db_to_polars"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1"
# Polars + IO (IPC reader)
polars = { version = "0.50", default-features = false, features = ["fmt"] }
polars-io = { version = "0.50", features = ["ipc"] }
# Postgres & SQL Server via ConnectorX → Polars
connectorx = { version = "0.4", default-features = false, features = [
"src_postgres",
"src_mssql",
"dst_polars",
] }
# ODBC + Arrow + IPC for Vertica/DB2
odbc-api = "6"
arrow-odbc = "18"
arrow-ipc = "57"Why this is “typed”:
-
arrow-odbcyields ArrowRecordBatches using driver metadata—no text casts. Docs.rsCrates.io -
We serialize those batches to Arrow IPC stream in memory and read them with Polars’ IPC reader, preserving the Arrow schema end-to-end (still zero schema guessing). Docs.rsdocs.pola.rs
If you later want true zero-copy (skip IPC bytes) we can switch to the Arrow C Data Interface (FFI) path and import the Arrow stream directly into Polars (same types, minimal overhead)—I can wire that up too. Docs.rs+1
Want me to bundle these into a tiny workspace with four binaries and a shared read_sql("query.sql") helper?