-
Star
(254)
You must be signed in to star a gist -
Fork
(35)
You must be signed in to fork a gist
-
-
Save sanchezzzhak/511fd140e8809857f8f1d84ddb937015 to your computer and use it in GitHub Desktop.
SELECT table, | |
formatReadableSize(sum(bytes)) as size, | |
min(min_date) as min_date, | |
max(max_date) as max_date | |
FROM system.parts | |
WHERE active | |
GROUP BY table |
If you are using min_time and max_time.
SELECT table, formatReadableSize(size) as size, rows, days, formatReadableSize(avgDaySize) as avgDaySize FROM (
SELECT
table,
sum(bytes) AS size,
sum(rows) AS rows,
min(min_time) AS min_time,
max(max_time) AS max_time,
toUInt32((max_time - min_time) / 86400) AS days,
size / ((max_time - min_time) / 86400) AS avgDaySize
FROM system.parts
WHERE active
GROUP BY table
ORDER BY rows DESC
)
Nice)
More detailed with rows count and average day size
SELECT table, formatReadableSize(size) as size, rows, days, formatReadableSize(avgDaySize) as avgDaySize FROM ( SELECT table, sum(bytes) AS size, sum(rows) AS rows, min(min_date) AS min_date, max(max_date) AS max_date, (max_date - min_date) AS days, size / (max_date - min_date) AS avgDaySize FROM system.parts WHERE active GROUP BY table ORDER BY rows DESC )
I think (max_date - min_date) should be (max_date - min_date + 1)
With some useful data:
--create view meta.tables_info as
select concat(database, '.', table) as table,
formatReadableSize(sum(bytes)) as size,
sum(rows) as rows,
max(modification_time) as latest_modification,
sum(bytes) as bytes_size,
any(engine) as engine,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;
Uncomment first line for saving query as new table.
Thanks, great script, here is my varient with compressing ratio:
select parts.*,
columns.compressed_size,
columns.uncompressed_size,
columns.ratio
from (
select table,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
from system.columns
where database = currentDatabase()
group by table
) columns
right join (
select table,
sum(rows) as rows,
max(modification_time) as latest_modification,
formatReadableSize(sum(bytes)) as disk_size,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
any(engine) as engine,
sum(bytes) as bytes_size
from system.parts
where active and database = currentDatabase()
group by database, table
) parts on columns.table = parts.table
order by parts.bytes_size desc;
thx for the nice ideas.
@nanmu42 i removed the usage of currentDatabase()
and using this as view now, really helpfull, thx!
-- CREATE VIEW meta.table_info AS
select
parts.*,
columns.compressed_size,
columns.uncompressed_size,
columns.ratio
from (
select database,
table,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
from system.columns
group by database, table
) columns right join (
select database,
table,
sum(rows) as rows,
max(modification_time) as latest_modification,
formatReadableSize(sum(bytes)) as disk_size,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
any(engine) as engine,
sum(bytes) as bytes_size
from system.parts
where active
group by database, table
) parts on ( columns.database = parts.database and columns.table = parts.table )
order by parts.bytes_size desc;
Preview result for last comment
parts.database|parts.table |rows |latest_modification|disk_size |primary_keys_size|engine |bytes_size |compressed_size|uncompressed_size|ratio |
--------------+-------------------+----------+-------------------+----------+-----------------+---------+-----------+---------------+-----------------+--------------------+
XXXX_XYZ |stat |2785390432|2021-12-30 13:18:02|63.16 GiB |12.67 MiB |MergeTree|67818785564|62.92 GiB |455.69 GiB | 0.1380777904379178|
XXXX_XYZ |xxxx_stat | 126229688|2021-12-30 13:18:02|4.74 GiB |46.12 KiB |MergeTree| 5086896141|4.72 GiB |20.57 GiB | 0.22955898394311353|
XXXX_XYZ |stat_ad_rotate |2225681893|2021-12-30 13:15:01|2.68 GiB |5.70 MiB |MergeTree| 2875869576|2.63 GiB |78.76 GiB |0.033415321783118186|
XXXX_XYZ |xxxxx_stat | 4185953|2021-12-30 13:19:02|227.39 MiB|20.99 KiB |MergeTree| 238434440|222.91 MiB |896.37 MiB | 0.24868153479762312|
XXXX_XYZ |xxxxx_domain_link | 21416450|2021-12-30 13:19:02|139.04 MiB|26.41 KiB |MergeTree| 145798523|133.79 MiB |354.18 MiB | 0.37773677725847915|
XXXX_XYZ |xxxxx_stat | 1673219|2021-12-30 08:30:07|96.19 MiB |816.00 B |MergeTree| 100861023|95.41 MiB |334.69 MiB | 0.2850605846725567|
XXXX_XYZ |ab_stat | 5056|1970-01-01 03:00:00|132.36 KiB|15.00 B |MergeTree| 135541|130.16 KiB |666.56 KiB | 0.1952736755743085|
any idea how to remove the parts.
in parts.database
and parts.table
without writing down all columns again?
brilliant. I rounded the ratio for a smaller column
round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS ratio
Hey all, awesome scripts! Many thanks for this contributions ❤️
awesome scripts!
I add a compression percentage as find them easier than ratios at-a-glance
select parts.*,
columns.compressed_size,
columns.uncompressed_size,
columns.compression_ratio,
columns.compression_percentage
from (
select table,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes), 3) AS compression_ratio,
round((100 - (sum(data_compressed_bytes) * 100) / sum(data_uncompressed_bytes)), 3) AS compression_percentage
from system.columns
group by table
) columns
right join (
select table,
sum(rows) as rows,
max(modification_time) as latest_modification,
formatReadableSize(sum(bytes)) as disk_size,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
any(engine) as engine,
sum(bytes) as bytes_size
from system.parts
where active
group by database, table
) parts on columns.table = parts.table
order by parts.bytes_size desc;
Thanks a lot. Very helpful scripts.
Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.
Thx you guys and gals! Real useful sqls! You've just inspired me to code a little exporter to view the presented data in prometheus or grafana.
Mmm... Smth like this https://github.com/alexeyantropov/clickhouse_table_exporter/blob/main/src/exporter.py
@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?
@alexeyantropov do you mind to add some custom queries? Have you finished to code a little exporter?
I've finished and use it in my production environment. But sure I can add some extra queries and metrics. What have you to suggest?
if you have run out of disk space, you should pay attention to the logs.
Use any query above to identify the abnormal amount of data.
Find partition id
select * from system.parts where table = 'trace_log'
manual drop log
ALTER TABLE system.trace_log (DROP PARTITION 202504); -- 202504 value from column partition
After manual cleaning, you can use this manual for disable trace_log and other log metrics
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/
Nice)
More detailed with rows count and average day size