Created
March 19, 2026 04:50
-
-
Save Slach/05a00a72d2fb453bd84cd4b54522f596 to your computer and use it in GitHub Desktop.
ClickHouse: DROP TABLE SYNC and async S3/Azure blob deletion (v26.2+)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ClickHouse: `DROP TABLE SYNC` and Async Object Storage Deletion (v26.2+) | |
| Applies to: `s3`, `azure_blob_storage`, `hdfs`, `local_blob_storage` disk types. | |
| ## What changed in ~v26.2 | |
| A `BlobKillerThread` background thread was introduced. It is now responsible for | |
| **all actual blob deletion** from object storage. The metadata files are updated | |
| synchronously, but the blobs themselves are always removed asynchronously. | |
| ## `DROP TABLE SYNC` does NOT mean synchronous S3/Azure deletion | |
| The deletion flow is always: | |
| 1. `DROP TABLE [SYNC]` → metadata marked as deleted (synchronous) | |
| 2. Blobs added to an **in-memory** removal queue | |
| 3. `BlobKillerThread` drains the queue in the background (default interval: 1s) | |
| The `SYNC` keyword only affects **when** the metadata operation happens: | |
| - On `Atomic` DB: without `SYNC`, removal is delayed; with `SYNC`, the query waits for the background task to finish. | |
| - On `Ordinary` DB: the `sync` parameter is **silently ignored** (`bool /*sync*/` in `DatabaseOnDisk::dropTable`). Behavior is the same with or without `SYNC`. | |
| Neither engine makes blob deletion synchronous. | |
| ## `SETTINGS` on `DROP TABLE` — not supported | |
| The parser only accepts fixed keywords (`IF EXISTS`, `ON CLUSTER`, `SYNC`, `PERMANENTLY`). | |
| There is no way to override async deletion behavior via SQL. | |
| ## How to wait for actual blob deletion | |
| ```sql | |
| DROP TABLE my_table SYNC; | |
| SYSTEM WAIT BLOBS CLEANUP; -- or: SYSTEM WAIT BLOBS CLEANUP 'disk_name' | |
| ``` | |
| ## Behavior on server restart | |
| The removal queue is **in-memory only** — it is not persisted to disk or ZooKeeper. | |
| | Scenario | Result | | |
| |---|---| | |
| | Graceful stop (`SIGTERM`) | `BlobKillerThread::shutdown()` drains the full queue before exit — blobs are deleted | | |
| | Crash / `kill -9` / OOM | Queue is lost — **orphan blobs remain in the bucket forever** | | |
| There is no built-in mechanism to detect or clean up orphaned blobs after a crash. | |
| Use S3/Azure Lifecycle Policies or external tooling to periodically reconcile. | |
| ## Same behavior for S3 and Azure Blob Storage | |
| All object storage disk types share the same stack: | |
| `DiskObjectStorage` + `BlobKillerThread` + `MetadataStorageFromDisk`. | |
| The behavior described above is identical for `s3` and `azure_blob_storage`. | |
| ## `BlobKillerThread` tuning (per-disk config) | |
| | Parameter | Default | Description | | |
| |---|---|---| | |
| | `interval_sec` | `1` | Wake-up interval | | |
| | `metadata_request_size` | `1000` | Blobs fetched per iteration | | |
| | `threads_count` | `16` | Parallel deletion threads | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment