Skip to content

Instantly share code, notes, and snippets.

@Slach
Created March 19, 2026 04:50
Show Gist options
  • Select an option

  • Save Slach/05a00a72d2fb453bd84cd4b54522f596 to your computer and use it in GitHub Desktop.

Select an option

Save Slach/05a00a72d2fb453bd84cd4b54522f596 to your computer and use it in GitHub Desktop.
ClickHouse: DROP TABLE SYNC and async S3/Azure blob deletion (v26.2+)
# ClickHouse: `DROP TABLE SYNC` and Async Object Storage Deletion (v26.2+)
Applies to: `s3`, `azure_blob_storage`, `hdfs`, `local_blob_storage` disk types.
## What changed in ~v26.2
A `BlobKillerThread` background thread was introduced. It is now responsible for
**all actual blob deletion** from object storage. The metadata files are updated
synchronously, but the blobs themselves are always removed asynchronously.
## `DROP TABLE SYNC` does NOT mean synchronous S3/Azure deletion
The deletion flow is always:
1. `DROP TABLE [SYNC]` → metadata marked as deleted (synchronous)
2. Blobs added to an **in-memory** removal queue
3. `BlobKillerThread` drains the queue in the background (default interval: 1s)
The `SYNC` keyword only affects **when** the metadata operation happens:
- On `Atomic` DB: without `SYNC`, removal is delayed; with `SYNC`, the query waits for the background task to finish.
- On `Ordinary` DB: the `sync` parameter is **silently ignored** (`bool /*sync*/` in `DatabaseOnDisk::dropTable`). Behavior is the same with or without `SYNC`.
Neither engine makes blob deletion synchronous.
## `SETTINGS` on `DROP TABLE` — not supported
The parser only accepts fixed keywords (`IF EXISTS`, `ON CLUSTER`, `SYNC`, `PERMANENTLY`).
There is no way to override async deletion behavior via SQL.
## How to wait for actual blob deletion
```sql
DROP TABLE my_table SYNC;
SYSTEM WAIT BLOBS CLEANUP; -- or: SYSTEM WAIT BLOBS CLEANUP 'disk_name'
```
## Behavior on server restart
The removal queue is **in-memory only** — it is not persisted to disk or ZooKeeper.
| Scenario | Result |
|---|---|
| Graceful stop (`SIGTERM`) | `BlobKillerThread::shutdown()` drains the full queue before exit — blobs are deleted |
| Crash / `kill -9` / OOM | Queue is lost — **orphan blobs remain in the bucket forever** |
There is no built-in mechanism to detect or clean up orphaned blobs after a crash.
Use S3/Azure Lifecycle Policies or external tooling to periodically reconcile.
## Same behavior for S3 and Azure Blob Storage
All object storage disk types share the same stack:
`DiskObjectStorage` + `BlobKillerThread` + `MetadataStorageFromDisk`.
The behavior described above is identical for `s3` and `azure_blob_storage`.
## `BlobKillerThread` tuning (per-disk config)
| Parameter | Default | Description |
|---|---|---|
| `interval_sec` | `1` | Wake-up interval |
| `metadata_request_size` | `1000` | Blobs fetched per iteration |
| `threads_count` | `16` | Parallel deletion threads |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment