All tests were performed in a docker container, using the image postgres:15.4
.
All tests used the same test data, but with scrambled hash values.
Only deviation from postgresql default startup parameters were:
-c work_mem=2048MB
-c maintenance_work_mem=2048MB
-c max_wal_size=8GB
First test was with straight up 60M records, split into 3 different reciever
's, using index lookups:
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
sender VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
);
CREATE INDEX IF NOT EXISTS messages_reciever_idx ON messages ((reciever));
Using a crude COUNT(*)
query:
SELECT COUNT(*) FROM messages WHERE reciever=%s
This took 2.49
seconds, and used up 27 GB
of disk space.
This test utilized the table partitioning.
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
sender VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
) PARTITION BY list(reciever);
And each type of reciever
got their own partition:
CREATE TABLE IF NOT EXISTS {reciever} PARTITION OF messages FOR VALUES IN ('reciever')
Same query getting COUNT(*)
for one particular reciever.
This took 0.95
seconds, and used up 27 GB
of space.
Last run was using the inheritance capabilities of postgresql tables. This one gets a little muddier as you'll need a TRIGGER on insert, that routes the incoming data to the appropriate table.
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
pubkey VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
);
CREATE TABLE IF NOT EXISTS messages_zzzzzzz (
CHECK ( reciever = 'zzzzzzz' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_zzzzzzz ON messages_zzzzzzz (reciever);
CREATE TABLE IF NOT EXISTS messages_yyyyyyy (
CHECK ( reciever = 'yyyyyyy' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_yyyyyyy ON messages_yyyyyyy (reciever);
CREATE TABLE IF NOT EXISTS messages_ggggggg (
CHECK ( reciever = 'ggggggg' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_ggggggg ON messages_ggggggg (reciever);
CREATE OR REPLACE FUNCTION messages_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.reciever = 'zzzzzzz' ) THEN INSERT INTO messages_zzzzzzz VALUES (NEW.*);
ELSIF ( NEW.reciever = 'yyyyyyy' ) THEN INSERT INTO messages_yyyyyyy VALUES (NEW.*);
ELSIF ( NEW.reciever = 'ggggggg' ) THEN INSERT INTO messages_ggggggg VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'No room with this name. Fix the messages_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER insert_messages_trigger
BEFORE INSERT ON messages
FOR EACH ROW EXECUTE FUNCTION messages_insert_trigger();
All it does is really create 4 tables:
messages
messages_zzzzzzz
messages_yyyyyyy
messages_ggggggg
With a hook that calls messages_insert_trigger()
on each insert into messages
.
Then we do our COUNT(*)
call again.
This took 0.45
seconds and used up 27 GB
of disk space.
Method | Query speed | Space used |
---|---|---|
Index | 2.49 sec | 27 GB |
Partition | 0.9 sec | 27 GB |
Inheritance | 0.45 sec | 27 GB |