Everything I knew to do here, I learned from this pluralsight course:https://app.pluralsight.com/library/courses/postgresql-playbook/table-of-contents and this blog post: http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/
I used cg run -e <env> bash
and then psql $DATABASE_URL
to connect to the DB
I used the postgres query here to find big tables with low rates of using indexes, see “Understanding Index Usage” : http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/
googledrivelti=> SELECT
googledrivelti-> relname,
googledrivelti-> 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
googledrivelti-> n_live_tup rows_in_table
googledrivelti-> FROM
googledrivelti-> pg_stat_user_tables
googledrivelti-> WHERE
googledrivelti-> seq_scan + idx_scan > 0
googledrivelti-> ORDER BY
googledrivelti-> n_live_tup DESC;
relname | percent_of_times_index_used | rows_in_table
----------------------------+-----------------------------+---------------
google_permissions | 99 | 5576237
credentials | 99 | 2994397
canvas_users_credentials | 99 | 2970155
share_items | 93 | 2850149
canvas_users | 99 | 2597312
cloud_assignment_instances | 99 | 2148947
collaborators | 99 | 44248
collaborations | 99 | 20144
lti_credentials | 100 | 10173
schema_migrations | 0 | 41
delayed_jobs | 12 | 2
failed_jobs | 0 | 0
(12 rows)
I went to kibana, and searched the logs for the name of that table, “share_items”.
I excluded anything with a method of “GET” or a @stream of “stdout” to exclude request logs, leaving just db related logs.
I scroll through the list, and within the first 50 spotted many of these two queries.
A Load running for over 10ms is a problem.
CloudAssignmentTemplate Load (344.6ms) SELECT "share_items".* FROM "share_items" WHERE "share_items"."type" IN ('CloudAssignmentTemplate') AND "share_items"."resource_id" = $1 AND "share_items"."context_id" = $2 LIMIT 1 [["resource_id", "9ee978b4-be4e-45bb-8bbf-2de75334a922"], ["context_id", "00c4d39b049b685fa3237d071a9721ea2f80f2ee”]]
ShareItem Load (226.6ms) SELECT "share_items".* FROM "share_items" WHERE "share_items"."resource_id" = $1 AND "share_items"."context_id" = $2 LIMIT 1 [["resource_id", "4c7cc5f2-805f-472f-acc6-991f92585550"], ["context_id", "4aeda0c17e715f21bcbfade079ea53fe4e46c8e2”]]
An EXPLAIN ANALYZE SELECT * FROM "share_items" WHERE "share_items"."type" IN ('CloudAssignmentTemplate') AND "share_items"."resource_id" = '9ee978b4-be4e-45bb-8bbf-2de75334a922' AND "share_items"."context_id" = '00c4d39b049b685fa3237d071a9721ea2f80f2ee' LIMIT 1;
Limit (cost=0.00..105985.88 rows=1 width=147) (actual time=290.635..290.636 rows=1 loops=1)
-> Seq Scan on share_items (cost=0.00..105985.88 rows=1 width=147) (actual time=290.631..290.631 rows=1 loops=1)
Filter: (((type)::text = 'CloudAssignmentTemplate'::text) AND (resource_id = '9ee978b4-be4e-45bb-8bbf-2de75334a922'::uuid) AND ((context_id)::text = '00c4d39b049b685fa3237d071a9721ea2f80f2ee'::text))
Rows Removed by Filter: 2826120
Planning time: 0.117 ms
Execution time: 291.073 ms
This suggests the need for an index on type, resoure_id, and context_id. Or perhaps a conditional index on resource_id and context_id where type in ('CloudAssignmentTemplate')? only 3.5% of rows have type='CloudAssignmentTemplate'. only 0.007% have a non-null type that is something else. All rows have a non-null resource_id and context_id. I asked about this in #devops, we'll see what the response is.
An EXPLAIN ANALYZE SELECT * FROM "share_items" WHERE "share_items"."resource_id" = '4c7cc5f2-805f-472f-acc6-991f92585550' AND "share_items"."context_id" = '4aeda0c17e715f21bcbfade079ea53fe4e46c8e2' LIMIT 1;
returns
Limit (cost=0.00..98827.18 rows=1 width=147) (actual time=186.026..186.027 rows=1 loops=1)
-> Seq Scan on share_items (cost=0.00..98827.18 rows=1 width=147) (actual time=186.022..186.022 rows=1 loops=1)
Filter: ((resource_id = '4c7cc5f2-805f-472f-acc6-991f92585550'::uuid) AND ((context_id)::text = '4aeda0c17e715f21bcbfade079ea53fe4e46c8e2'::text))
Rows Removed by Filter: 1042038
Planning time: 0.104 ms
Execution time: 186.053 ms
(6 rows)
This suggests the need for an index on resource_id & context_id.
I did a describe on the table, just to get my bearings
googledrivelti=> \d share_items
Table "public.share_items"
Column | Type | Modifiers
----------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('share_items_id_seq'::regclass)
canvas_user_id | integer | not null
context_id | character varying | not null
google_id | character varying | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
type | character varying |
resource_id | uuid | default gen_random_uuid()
Indexes:
"share_items_pkey" PRIMARY KEY, btree (id)
"index_share_items_on_canvas_user_id" btree (canvas_user_id)
Foreign-key constraints:
"fk_rails_1a94f3144b" FOREIGN KEY (canvas_user_id) REFERENCES canvas_users(id)
Referenced by:
TABLE "cloud_assignment_instances" CONSTRAINT "fk_rails_75cd0bb701" FOREIGN KEY (share_item_id) REFERENCES share_items(id)
TABLE "google_permissions" CONSTRAINT "fk_rails_bf348d4eae" FOREIGN KEY (share_item_id) REFERENCES share_items(id)
TABLE "cloud_assignment_instances" CONSTRAINT "fk_rails_fa5578b35a" FOREIGN KEY (template_id) REFERENCES share_items(id)
```