Skip to content

Instantly share code, notes, and snippets.

@jtanx
Created September 28, 2024 11:38
Show Gist options
  • Save jtanx/34de94ab3c390544c27202a020a0f6d1 to your computer and use it in GitHub Desktop.
Save jtanx/34de94ab3c390544c27202a020a0f6d1 to your computer and use it in GitHub Desktop.
Single pass collection of multiple polars lazyframes
import polars as pl
df = pl.LazyFrame({"A": range(1, 100), "B": range(100, 1, -1)})
df = df.filter(pl.col("A") > 20, pl.col("B") < 50)
df1 = df.with_columns(pl.col("A") * 2)
df2 = df.with_columns(pl.col("B") / 2)
# https://github.com/pola-rs/polars/issues/13065
# This will result in re-evaluating the initial filter twice
# pl.collect_all([df1, df2])
df1 = df1.select(pl.struct(pl.all()).implode().alias("df1"))
df2 = df2.select(pl.struct(pl.all()).implode().alias("df2"))
result = pl.concat([df1, df2], how="horizontal")
print(result.explain())
res = result.collect()
df1_res = res.select(pl.col("df1").explode().struct.field("*"))
df2_res = res.select(pl.col("df2").explode().struct.field("*"))
print(df1_res)
print(df2_res)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment