Skip to content

Instantly share code, notes, and snippets.

@sanchezzzhak
Created January 18, 2018 13:43
Show Gist options
  • Save sanchezzzhak/511fd140e8809857f8f1d84ddb937015 to your computer and use it in GitHub Desktop.
Save sanchezzzhak/511fd140e8809857f8f1d84ddb937015 to your computer and use it in GitHub Desktop.
clickhouse get tables size
SELECT table,
formatReadableSize(sum(bytes)) as size,
min(min_date) as min_date,
max(max_date) as max_date
FROM system.parts
WHERE active
GROUP BY table
@nanmu42
Copy link

nanmu42 commented Oct 15, 2020

Thanks, great script, here is my varient with compressing ratio:

select parts.*,
       columns.compressed_size,
       columns.uncompressed_size,
       columns.ratio
from (
         select table,
                formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
                formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
                sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
         from system.columns
         where database = currentDatabase()
         group by table
         ) columns
         right join (
    select table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active and database = currentDatabase()
    group by database, table
    ) parts on columns.table = parts.table
order by parts.bytes_size desc;

@gebi
Copy link

gebi commented Dec 30, 2021

thx for the nice ideas.
@nanmu42 i removed the usage of currentDatabase() and using this as view now, really helpfull, thx!

-- CREATE VIEW meta.table_info AS
select
    parts.*,
    columns.compressed_size,
    columns.uncompressed_size,
    columns.ratio
from (
    select database,
        table,
        formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
        formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
        sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
    from system.columns
    group by database, table
) columns right join (
    select database,
           table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active
    group by database, table
) parts on ( columns.database = parts.database and columns.table = parts.table )
order by parts.bytes_size desc;

@sanchezzzhak
Copy link
Author

sanchezzzhak commented Dec 30, 2021

Preview result for last comment

parts.database|parts.table        |rows      |latest_modification|disk_size |primary_keys_size|engine   |bytes_size |compressed_size|uncompressed_size|ratio               |
--------------+-------------------+----------+-------------------+----------+-----------------+---------+-----------+---------------+-----------------+--------------------+
XXXX_XYZ      |stat               |2785390432|2021-12-30 13:18:02|63.16 GiB |12.67 MiB        |MergeTree|67818785564|62.92 GiB      |455.69 GiB       |  0.1380777904379178|
XXXX_XYZ      |xxxx_stat          | 126229688|2021-12-30 13:18:02|4.74 GiB  |46.12 KiB        |MergeTree| 5086896141|4.72 GiB       |20.57 GiB        | 0.22955898394311353|
XXXX_XYZ      |stat_ad_rotate     |2225681893|2021-12-30 13:15:01|2.68 GiB  |5.70 MiB         |MergeTree| 2875869576|2.63 GiB       |78.76 GiB        |0.033415321783118186|
XXXX_XYZ      |xxxxx_stat         |   4185953|2021-12-30 13:19:02|227.39 MiB|20.99 KiB        |MergeTree|  238434440|222.91 MiB     |896.37 MiB       | 0.24868153479762312|
XXXX_XYZ      |xxxxx_domain_link  |  21416450|2021-12-30 13:19:02|139.04 MiB|26.41 KiB        |MergeTree|  145798523|133.79 MiB     |354.18 MiB       | 0.37773677725847915|
XXXX_XYZ      |xxxxx_stat         |   1673219|2021-12-30 08:30:07|96.19 MiB |816.00 B         |MergeTree|  100861023|95.41 MiB      |334.69 MiB       |  0.2850605846725567|
XXXX_XYZ      |ab_stat            |      5056|1970-01-01 03:00:00|132.36 KiB|15.00 B          |MergeTree|     135541|130.16 KiB     |666.56 KiB       |  0.1952736755743085|

@gebi
Copy link

gebi commented Dec 30, 2021

any idea how to remove the parts. in parts.database and parts.table without writing down all columns again?

@millecodex
Copy link

brilliant. I rounded the ratio for a smaller column

round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS ratio

@m0n0x41d
Copy link

Hey all, awesome scripts! Many thanks for this contributions ❤️

@virtualsafety
Copy link

awesome scripts!

@Recodify
Copy link

Recodify commented Nov 2, 2022

I add a compression percentage as find them easier than ratios at-a-glance

select parts.*,
       columns.compressed_size,
       columns.uncompressed_size,
       columns.compression_ratio,
       columns.compression_percentage
from (
         select table,
                formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
                formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
                round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS  compression_ratio,
                round((100 - (sum(data_compressed_bytes) * 100) / sum(data_uncompressed_bytes)), 3) AS compression_percentage

             from system.columns
             group by table
         ) columns
         right join (
    select table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active 
    group by database, table
    ) parts on columns.table = parts.table
order by parts.bytes_size desc;

@sanalchandran
Copy link

Thanks a lot. Very helpful scripts.

@alexeyantropov
Copy link

Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.

@alexeyantropov
Copy link

Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.

Mmm... Smth like this https://github.com/alexeyantropov/clickhouse_table_exporter/blob/main/src/exporter.py

@aitudorm
Copy link

@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?

@alexeyantropov
Copy link

@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?

I've finished and use it in my production environment. But sure I can add some extra queries and metrics. What have you to suggest?

@sanchezzzhak
Copy link
Author

if you have run out of disk space, you should pay attention to the logs.
Use any query above to identify the abnormal amount of data.

Find partition id

select * from system.parts where table = 'trace_log'

manual drop log

ALTER TABLE system.trace_log (DROP PARTITION 202504);  -- 202504 value from column partition

After manual cleaning, you can use this manual for disable trace_log and other log metrics
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/

@kislayaaakash
Copy link

Hi everyone, I new to Clickhouse. But my use case is also similar. I want to develop a time-series based dashboard in Grafana which shows how all table's size varies in a timely fashion. Does anyone has an idea how to get that done. Like my query just gives the table size for current time. I want table size over a period of time. Can someone please help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment