Skip to content

Instantly share code, notes, and snippets.

@cjbj
Last active August 18, 2025 23:54
Show Gist options
  • Select an option

  • Save cjbj/46494611dac8801791c695edd2e5be5a to your computer and use it in GitHub Desktop.

Select an option

Save cjbj/46494611dac8801791c695edd2e5be5a to your computer and use it in GitHub Desktop.
Querying a DataFrame in batches and writing to Delta Lake using python-oracledb from the video "Python DataFrames with Oracle Database for Analysis and AI" https://youtu.be/0BJNlbh71LY For other examples, see https://github.com/oracle/python-oracledb/tree/main/samples
# deltalake-write.py
#
# For other python-oracledb DataFrame examples, see https://github.com/oracle/python-oracledb/tree/main/samples
import os
import oracledb
import pyarrow
from deltalake import DeltaTable, write_deltalake
DIR_NAME_DL = "/tmp/oracledb_demo_deltalake"
os.system(f"/bin/rm -rf {DIR_NAME_DL}")
#-------------------------------------------------------------------------------
un = os.environ.get('ORACLE_USERNAME')
pw = os.environ.get('ORACLE_PASSWORD')
cs = os.environ.get('ORACLE_DSN')
connection = oracledb.connect(user=un, password=pw, dsn=cs)
#-------------------------------------------------------------------------------
sql = """select employee_id, department_id, salary
from employees
where department_id < :id
order by department_id"""
for odf in connection.fetch_df_batches(
statement=sql,
parameters=[80],
size=1000
):
pat = pyarrow.table(odf)
write_deltalake(DIR_NAME_DL, pat, mode='append')
#-------------------------------------------------------------------------------
print()
# Check the written file
dt = DeltaTable(DIR_NAME_DL)
df = dt.to_pandas()
print(df)
print()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment