-
-
Save MichaelLeeHobbs/67980d165fc68880eb2ab283c673244b to your computer and use it in GitHub Desktop.
-- 20201125T1403 EST - Fix bug in xpath not correctly matching values to correct channel | |
-- 20201125T1430 EST - Updated formatting only using: http://poorsql.com/ with minor hand edits | |
-- 20201125T1432 EST - Removed CID from final output as it was only there for debugging xpath issues | |
-- 20210321T1140 EST - Postgres 13+ XPATH works differently see: https://gist.github.com/MichaelLeeHobbs/40b4b7cf70ecbe30b73eed763367e626 | |
-- | |
-- Jon Bartels | |
-- Similar query for SQL Server is here - https://stackoverflow.com/questions/58942371/get-space-used-by-mirth-per-channel-in-sql-server | |
-- | |
SELECT NAME AS CHANNEL_NAME | |
, TOTAL_BYTES | |
, CHANNEL_ID | |
, PRUNE_ENABLED | |
, PRUNE_DAYS | |
, ARCHIVE_ENABLED | |
, TABLE_SCHEMA | |
, TABLE_NAME | |
, MIRTH_ID | |
, MIRTH_TYPE | |
, ROW_ESTIMATE | |
, INDEX_BYTES | |
, TOAST_BYTES | |
, TABLE_BYTES | |
FROM ( | |
WITH RECURSIVE PG_INHERIT(INHRELID, INHPARENT) AS ( | |
SELECT INHRELID, INHPARENT | |
FROM PG_INHERITS | |
UNION | |
SELECT CHILD.INHRELID, PARENT.INHPARENT | |
FROM PG_INHERIT CHILD, PG_INHERITS PARENT | |
WHERE CHILD.INHPARENT = PARENT.INHRELID | |
) | |
, PG_INHERIT_SHORT AS ( | |
SELECT * | |
FROM PG_INHERIT | |
WHERE INHPARENT NOT IN (SELECT INHRELID FROM PG_INHERIT) | |
) | |
SELECT TABLE_SCHEMA | |
, TABLE_NAME | |
, NULLIF(REGEXP_REPLACE(TABLE_NAME, '\D', '', 'g'), '')::NUMERIC AS MIRTH_ID | |
, SUBSTRING(TABLE_NAME FROM 'd_(\D+)\d+') AS MIRTH_TYPE | |
, ROW_ESTIMATE | |
, PG_SIZE_PRETTY(TOTAL_BYTES) AS TOTAL_BYTES | |
, PG_SIZE_PRETTY(INDEX_BYTES) AS INDEX_BYTES | |
, PG_SIZE_PRETTY(TOAST_BYTES) AS TOAST_BYTES | |
, PG_SIZE_PRETTY(TABLE_BYTES) AS TABLE_BYTES | |
FROM ( | |
SELECT *, TOTAL_BYTES - INDEX_BYTES - COALESCE(TOAST_BYTES, 0) AS TABLE_BYTES | |
FROM ( | |
SELECT C.OID | |
, NSPNAME AS TABLE_SCHEMA | |
, RELNAME AS TABLE_NAME | |
, SUM(C.RELTUPLES) OVER (PARTITION BY PARENT) AS ROW_ESTIMATE | |
, SUM(PG_TOTAL_RELATION_SIZE(C.OID)) OVER (PARTITION BY PARENT) AS TOTAL_BYTES | |
, SUM(PG_INDEXES_SIZE(C.OID)) OVER (PARTITION BY PARENT) AS INDEX_BYTES | |
, SUM(PG_TOTAL_RELATION_SIZE(RELTOASTRELID)) OVER (PARTITION BY PARENT) AS TOAST_BYTES | |
, PARENT | |
FROM ( | |
SELECT PG_CLASS.OID | |
, RELTUPLES | |
, RELNAME | |
, RELNAMESPACE | |
, PG_CLASS.RELTOASTRELID | |
, COALESCE(INHPARENT, PG_CLASS.OID) PARENT | |
FROM PG_CLASS | |
LEFT JOIN PG_INHERIT_SHORT ON INHRELID = OID | |
WHERE RELKIND IN ('r', 'p') | |
) C | |
LEFT JOIN PG_NAMESPACE N ON N.OID = C.RELNAMESPACE | |
ORDER BY TOTAL_BYTES DESC | |
) A | |
WHERE OID = PARENT | |
) A | |
) TABLE_SIZES | |
LEFT JOIN D_CHANNELS ON D_CHANNELS.LOCAL_CHANNEL_ID = TABLE_SIZES.MIRTH_ID | |
LEFT JOIN CHANNEL ON CHANNEL.ID = D_CHANNELS.CHANNEL_ID | |
LEFT JOIN ( | |
SELECT (XPATH('string/text()', ENTRY)) [1]::TEXT AS CID | |
, (XPATH('com.mirth.connect.model.ChannelMetadata/enabled/text()', ENTRY)) [1]::TEXT::boolean AS PRUNE_ENABLED | |
, (XPATH('com.mirth.connect.model.ChannelMetadata/pruningSettings/pruneMetaDataDays/text()', ENTRY)) [1]::TEXT::INT AS PRUNE_DAYS | |
, (XPATH('com.mirth.connect.model.ChannelMetadata/pruningSettings/archiveEnabled/text()', ENTRY)) [1]::TEXT::boolean AS ARCHIVE_ENABLED | |
FROM ( | |
SELECT UNNEST(XPATH('/map/entry', VALUE::XML)) AS ENTRY | |
FROM CONFIGURATION | |
WHERE CATEGORY = 'core' AND NAME = 'channelMetadata' | |
) X | |
) AS M ON M.CID = D_CHANNELS.CHANNEL_ID; |
Reordering your SELECT clause may be useful, most readers will want to know "which channels big?" so do this:
SELECT name as channel_name , total_bytes , channel_id , table_schema , TABLE_NAME , mirth_id , cast (row_estimate as bigint) as row_estimate , index_bytes , toast_bytes , table_bytes
You can also add a table type like this:
, NULLIF(regexp_replace(TABLE_NAME, '\D','','g'), '')::numeric as mirth_id , substring(TABLE_NAME FROM 'd_(\D+)\d+') as mirth_type
that gives mm, mc, ma, mcm etc as a new column. This will help users know if say they're storing too much crap in maps versus dealing with big messages. A case
statement might be used to translate the ma, mc, mcm to their plain-language names
I've reordered the columns and added mirth_type. Going to leave the pg_inherit for now incase someone is storing data other than mirth data in the DB that happens to use autopartitioning. Very rare if anyone does but just in case.
Posting an idea here - I'll play with it as I have time.
A slack user was facing high memory usage issues in MC. One common cause of this is having large objects in map data. This causes high memory usage because ALL of those maps are in memory.
This is relevant to the size query here because the d_mc
table size could be computed by content_type
. This would allow an engineer analysing space usage to determine if:
- they have large messages (USE ATTACHMENTS)
- they have excessive data in maps (store less stuff in maps or use attachments)
- there are large error objects (rare, but possible)
The enum for content_type
is:
RAW(1), PROCESSED_RAW(2), TRANSFORMED(3), ENCODED(4), SENT(5), RESPONSE(
6), RESPONSE_TRANSFORMED(7), PROCESSED_RESPONSE(8), CONNECTOR_MAP(9), CHANNEL_MAP(
10), RESPONSE_MAP(11), PROCESSING_ERROR(
12), POSTPROCESSOR_ERROR(13), RESPONSE_ERROR(14), SOURCE_MAP(15)
I think this would be best implemented as a stored procedure that takes a local_channel_id
and a date range as an argument and returns a JSON object with the size breakdown by content_type
. I worry this query would be inefficient since it has to actually look at table data and not just PG internal data to calculate size.
I implemented this:
EXPLAIN SELECT content_type, pg_size_pretty(sum(length(content)))
FROM d_mc3
WHERE content_type IN (9,10,11,15)
GROUP BY content_type;
The content type is part of a composite primary key, so it's indexed.
The query plan still has to do a parallel sequence scan though. I don't think it can be avoided
QUERY PLAN
Finalize GroupAggregate (cost=66892.54..66895.61 rows=12 width=36)
Group Key: content_type
-> Gather Merge (cost=66892.54..66895.34 rows=24 width=12)
Workers Planned: 2
-> Sort (cost=65892.52..65892.55 rows=12 width=12)
Sort Key: content_type
-> Partial HashAggregate (cost=65892.18..65892.30 rows=12 width=12)
Group Key: content_type
-> Parallel Seq Scan on d_mc3 (cost=0.00..65387.76 rows=67256 width=785)
" Filter: (content_type = ANY ('{9,10,11,15}'::integer[]))"
For my stated use case, perhaps computing the average map content length would be more useful to identify which channels are storing more data in maps on average than in total:
EXPLAIN SELECT content_type, pg_size_pretty(avg(length(content))), pg_size_pretty(min(length(content)::BIGINT)), pg_size_pretty(max(length(content))::BIGINT)
FROM d_mc3
WHERE content_type IN (9,10,11,15)
GROUP BY content_type;
Adding order by pg_size_bytes(total_bytes) DESC
makes this query show the largest tables first.
Similar query for SQL Server is here - https://stackoverflow.com/questions/58942371/get-space-used-by-mirth-per-channel-in-sql-server
Do you really need pg_inherit in MC? AFAICT that handles table inheritance, most often used with autopartitioning in recent PG releases. To the best of my knowledge MC doesn't do that. Frankly if you have MC tables that are so large as to need partitioning you're using MC for long term storage and thats not its ideal use case.
The overall query plan is a little chunky, if you definitely do not need the inheritance counts removing them would eliminate the need for a CTE and some of the subqueries and the partitioning.