Skip to content

Instantly share code, notes, and snippets.

@sanchezzzhak
Created January 18, 2018 13:43
Show Gist options
  • Save sanchezzzhak/511fd140e8809857f8f1d84ddb937015 to your computer and use it in GitHub Desktop.
Save sanchezzzhak/511fd140e8809857f8f1d84ddb937015 to your computer and use it in GitHub Desktop.
clickhouse get tables size
SELECT table,
formatReadableSize(sum(bytes)) as size,
min(min_date) as min_date,
max(max_date) as max_date
FROM system.parts
WHERE active
GROUP BY table
@nezed
Copy link

nezed commented Aug 16, 2018

Nice)

More detailed with rows count and average day size

SELECT table, formatReadableSize(size) as size, rows, days, formatReadableSize(avgDaySize) as avgDaySize FROM (
    SELECT
        table,
        sum(bytes) AS size,
        sum(rows) AS rows,
        min(min_date) AS min_date,
        max(max_date) AS max_date,
        (max_date - min_date) AS days,
        size / (max_date - min_date) AS avgDaySize
    FROM system.parts
    WHERE active
    GROUP BY table
    ORDER BY rows DESC
)

@rafiqrahim
Copy link

rafiqrahim commented Apr 16, 2019

If you are using min_time and max_time.

SELECT table, formatReadableSize(size) as size, rows, days, formatReadableSize(avgDaySize) as avgDaySize FROM (
    SELECT
        table,
        sum(bytes) AS size,
        sum(rows) AS rows,
        min(min_time) AS min_time,
        max(max_time) AS max_time,
        toUInt32((max_time - min_time) / 86400) AS days,
        size / ((max_time - min_time) / 86400) AS avgDaySize
    FROM system.parts
    WHERE active
    GROUP BY table
    ORDER BY rows DESC
)

@maganiss
Copy link

Nice)

More detailed with rows count and average day size

SELECT table, formatReadableSize(size) as size, rows, days, formatReadableSize(avgDaySize) as avgDaySize FROM (
    SELECT
        table,
        sum(bytes) AS size,
        sum(rows) AS rows,
        min(min_date) AS min_date,
        max(max_date) AS max_date,
        (max_date - min_date) AS days,
        size / (max_date - min_date) AS avgDaySize
    FROM system.parts
    WHERE active
    GROUP BY table
    ORDER BY rows DESC
)

I think (max_date - min_date) should be (max_date - min_date + 1)

@uburuntu
Copy link

With some useful data:

--create view meta.tables_info as
select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Uncomment first line for saving query as new table.

@nanmu42
Copy link

nanmu42 commented Oct 15, 2020

Thanks, great script, here is my varient with compressing ratio:

select parts.*,
       columns.compressed_size,
       columns.uncompressed_size,
       columns.ratio
from (
         select table,
                formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
                formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
                sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
         from system.columns
         where database = currentDatabase()
         group by table
         ) columns
         right join (
    select table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active and database = currentDatabase()
    group by database, table
    ) parts on columns.table = parts.table
order by parts.bytes_size desc;

@gebi
Copy link

gebi commented Dec 30, 2021

thx for the nice ideas.
@nanmu42 i removed the usage of currentDatabase() and using this as view now, really helpfull, thx!

-- CREATE VIEW meta.table_info AS
select
    parts.*,
    columns.compressed_size,
    columns.uncompressed_size,
    columns.ratio
from (
    select database,
        table,
        formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
        formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
        sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
    from system.columns
    group by database, table
) columns right join (
    select database,
           table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active
    group by database, table
) parts on ( columns.database = parts.database and columns.table = parts.table )
order by parts.bytes_size desc;

@sanchezzzhak
Copy link
Author

sanchezzzhak commented Dec 30, 2021

Preview result for last comment

parts.database|parts.table        |rows      |latest_modification|disk_size |primary_keys_size|engine   |bytes_size |compressed_size|uncompressed_size|ratio               |
--------------+-------------------+----------+-------------------+----------+-----------------+---------+-----------+---------------+-----------------+--------------------+
XXXX_XYZ      |stat               |2785390432|2021-12-30 13:18:02|63.16 GiB |12.67 MiB        |MergeTree|67818785564|62.92 GiB      |455.69 GiB       |  0.1380777904379178|
XXXX_XYZ      |xxxx_stat          | 126229688|2021-12-30 13:18:02|4.74 GiB  |46.12 KiB        |MergeTree| 5086896141|4.72 GiB       |20.57 GiB        | 0.22955898394311353|
XXXX_XYZ      |stat_ad_rotate     |2225681893|2021-12-30 13:15:01|2.68 GiB  |5.70 MiB         |MergeTree| 2875869576|2.63 GiB       |78.76 GiB        |0.033415321783118186|
XXXX_XYZ      |xxxxx_stat         |   4185953|2021-12-30 13:19:02|227.39 MiB|20.99 KiB        |MergeTree|  238434440|222.91 MiB     |896.37 MiB       | 0.24868153479762312|
XXXX_XYZ      |xxxxx_domain_link  |  21416450|2021-12-30 13:19:02|139.04 MiB|26.41 KiB        |MergeTree|  145798523|133.79 MiB     |354.18 MiB       | 0.37773677725847915|
XXXX_XYZ      |xxxxx_stat         |   1673219|2021-12-30 08:30:07|96.19 MiB |816.00 B         |MergeTree|  100861023|95.41 MiB      |334.69 MiB       |  0.2850605846725567|
XXXX_XYZ      |ab_stat            |      5056|1970-01-01 03:00:00|132.36 KiB|15.00 B          |MergeTree|     135541|130.16 KiB     |666.56 KiB       |  0.1952736755743085|

@gebi
Copy link

gebi commented Dec 30, 2021

any idea how to remove the parts. in parts.database and parts.table without writing down all columns again?

@millecodex
Copy link

brilliant. I rounded the ratio for a smaller column

round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS ratio

@m0n0x41d
Copy link

Hey all, awesome scripts! Many thanks for this contributions ❤️

@virtualsafety
Copy link

awesome scripts!

@Recodify
Copy link

Recodify commented Nov 2, 2022

I add a compression percentage as find them easier than ratios at-a-glance

select parts.*,
       columns.compressed_size,
       columns.uncompressed_size,
       columns.compression_ratio,
       columns.compression_percentage
from (
         select table,
                formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
                formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
                round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS  compression_ratio,
                round((100 - (sum(data_compressed_bytes) * 100) / sum(data_uncompressed_bytes)), 3) AS compression_percentage

             from system.columns
             group by table
         ) columns
         right join (
    select table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active 
    group by database, table
    ) parts on columns.table = parts.table
order by parts.bytes_size desc;

@sanalchandran
Copy link

Thanks a lot. Very helpful scripts.

@alexeyantropov
Copy link

Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.

@alexeyantropov
Copy link

Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.

Mmm... Smth like this https://github.com/alexeyantropov/clickhouse_table_exporter/blob/main/src/exporter.py

@aitudorm
Copy link

@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?

@alexeyantropov
Copy link

@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?

I've finished and use it in my production environment. But sure I can add some extra queries and metrics. What have you to suggest?

@sanchezzzhak
Copy link
Author

if you have run out of disk space, you should pay attention to the logs.
Use any query above to identify the abnormal amount of data.

Find partition id

select * from system.parts where table = 'trace_log'

manual drop log

ALTER TABLE system.trace_log (DROP PARTITION 202504);  -- 202504 value from column partition

After manual cleaning, you can use this manual for disable trace_log and other log metrics
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment