Skip to content

Instantly share code, notes, and snippets.

@pdet
Created September 4, 2025 12:11
Show Gist options
  • Save pdet/524ea6a0d7ddfc8931c6af48c10b65da to your computer and use it in GitHub Desktop.
Save pdet/524ea6a0d7ddfc8931c6af48c10b65da to your computer and use it in GitHub Desktop.
DuckLake Demo.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"private_outputs": true,
"provenance": [],
"toc_visible": true,
"authorship_tag": "ABX9TyNjqlLxY024MmwYExxlMdrJ",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/pdet/524ea6a0d7ddfc8931c6af48c10b65da/ducklake-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Creating a DuckLake"
],
"metadata": {
"id": "gzSacyS7Dt-1"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_CFvmm_1DHrf"
},
"outputs": [],
"source": [
"import duckdb\n",
"\n",
"con = duckdb.connect()\n",
"\n",
"con.execute(\"ATTACH 'ducklake:ducklake.db' as ducklake\")\n",
"\n",
"con.execute(\"USE ducklake\")"
]
},
{
"cell_type": "markdown",
"source": [
"# Creating Tables"
],
"metadata": {
"id": "gDqcLoCwDvwk"
}
},
{
"cell_type": "code",
"source": [
"con.execute(\"CREATE TABLE human(id bigint, name string)\")\n",
"\n",
"con.execute(\"CREATE TABLE pet(human_id bigint, name string)\")\n",
"\n",
"con.execute(\"INSERT INTO human values (1, 'Pedro'), (2, 'Natanzinho'), (3, 'Ana Maria Braga')\")\n",
"\n",
"con.execute(\"INSERT INTO pet values (1, 'Oogie Boogie'), (2, 'Zeca'), (2, 'Max'), (3, 'Louro Jose')\")\n",
"\n",
"print (con.sql(\"FROM human INNER JOIN pet on (human.id = pet.human_id)\"))"
],
"metadata": {
"id": "GMrWfbd8DqUd"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Inspecting Catalog\n"
],
"metadata": {
"id": "yhxLJKAjD-xD"
}
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM snapshots()\"))\n",
"\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake.ducklake_data_file;\"))\n"
],
"metadata": {
"id": "vjitLwsGHVSp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Updates"
],
"metadata": {
"id": "kmtG4MAQEO0w"
}
},
{
"cell_type": "code",
"source": [
"\n",
"con.execute(\"START TRANSACTION\")\n",
"\n",
"con.execute(\"DELETE FROM human where id = 2\")\n",
"\n",
"con.execute(\"DELETE FROM pet where human_id = 2\")\n",
"\n",
"con.execute (\"COMMIT\")\n",
"\n",
"con.execute(\"START TRANSACTION\")\n",
"\n",
"con.execute(\"INSERT INTO human values (4, 'Jack Black')\")\n",
"\n",
"con.execute(\"INSERT INTO pet values (4, 'Hope')\")\n",
"\n",
"con.execute (\"COMMIT\")\n",
"\n",
"con.execute(\"START TRANSACTION\")\n",
"\n",
"con.execute(\"INSERT INTO human values (5, 'Jon')\")\n",
"\n",
"con.execute(\"INSERT INTO pet values (5, 'Garfield')\")\n",
"\n",
"con.execute (\"COMMIT\")"
],
"metadata": {
"id": "8AacvqQ4aMKs"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM snapshots()\"))\n",
"\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake.ducklake_data_file;\"))\n",
"\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake.ducklake_delete_file;\"))\n",
"\n",
"print (con.sql(\"FROM human INNER JOIN pet on (human.id = pet.human_id)\"))"
],
"metadata": {
"id": "1rQYsbIGbIZu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Time Travel"
],
"metadata": {
"id": "4efhSA_SEQ-h"
}
},
{
"cell_type": "markdown",
"source": [
"It is possible to define a snapshot_id or actual time for time travel.\n",
"This can be done per table (as in the example), or during attach for the whole database."
],
"metadata": {
"id": "BU-ZkJ3VgUhA"
}
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM human AT (VERSION => 4) INNER JOIN pet AT (VERSION => 4) on (human.id = pet.human_id) \"))"
],
"metadata": {
"id": "cjJJZucmb7lg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Compaction"
],
"metadata": {
"id": "hlPH-wsWEVct"
}
},
{
"cell_type": "markdown",
"source": [
"There are several compaction routines in DuckLake, this example will cover the Adjacent File Merging, and Old File Removal\n",
"Other compaction routines that exist:\n",
"* Expire Snapshot\n",
"* Flush of inlined updates\n",
"\n",
"Coming in v0.3:\n",
"* Orphan File Deletion\n",
"* Rewrite of Deletes\n",
"* A full routines with `CHECKPOINT;`\n"
],
"metadata": {
"id": "kCHQ2drRfsj7"
}
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM __ducklake_metadata_ducklake.ducklake_data_file;\"))\n",
"\n",
"con.execute(\"CALL ducklake.merge_adjacent_files();\")\n",
"\n",
"print(con.sql(\"FROM __ducklake_metadata_ducklake.ducklake_data_file;\"))"
],
"metadata": {
"id": "oorq-aJedsJV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(con.sql(\"CALL ducklake_cleanup_old_files('ducklake', cleanup_all => true, dry_run => true)\"))"
],
"metadata": {
"id": "sJYyWeNei_V3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"con.execute(\"CALL ducklake_cleanup_old_files('ducklake', cleanup_all => true)\")"
],
"metadata": {
"id": "PyQ8BTs4jUfQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#Inlining Updates"
],
"metadata": {
"id": "0iEwAzYoZySx"
}
},
{
"cell_type": "code",
"source": [
"con.execute(\"USE memory\")\n",
"con.execute(\"DETACH ducklake\")\n",
"\n",
"con.execute(\"ATTACH 'ducklake:ducklake.db' as ducklake_inline (DATA_INLINING_ROW_LIMIT 2)\")\n",
"\n",
"con.execute(\"USE ducklake_inline\")"
],
"metadata": {
"id": "gE1JpUNaj6m7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM __ducklake_metadata_ducklake_inline.ducklake_data_file;\"))\n",
"con.execute(\"INSERT INTO human values (6, 'Super-Man')\")\n",
"con.execute(\"INSERT INTO pet values (6, 'Krypto')\")\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake_inline.ducklake_data_file;\"))\n",
"\n",
"# -- flush all inlined data in all schemas and tables\n",
"# CALL ducklake_flush_inlined_data('my_ducklake');"
],
"metadata": {
"id": "SbFFaBjPkTxq"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print (con.sql(\"FROM __ducklake_metadata_ducklake_inline.ducklake_inlined_data_tables;\"))\n",
"\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake_inline.ducklake_inlined_data_1_3;\"))\n",
"print (con.sql(\"FROM __ducklake_metadata_ducklake_inline.ducklake_inlined_data_2_4;\"))\n",
"\n",
"\n",
"print (con.sql(\"FROM human INNER JOIN pet on (human.id = pet.human_id)\"))"
],
"metadata": {
"id": "kdCEDtbZlL2d"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Not covered in the DEMO"
],
"metadata": {
"id": "CGKJXWB7gmb0"
}
},
{
"cell_type": "markdown",
"source": [
"* Encryption\n",
"* Partitioning\n",
"* Schema Changes\n",
"* Data Change Feed\n",
"* Audit\n",
"* And more..."
],
"metadata": {
"id": "1BhRVQeJgqNO"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment