Last active
February 16, 2026 09:56
-
-
Save lordmatt/31f4f0b508b2dcf353c36ab23f907eea to your computer and use it in GitHub Desktop.
Search WordPress comments for the umber of WebMentions per domain excluding anything that came from Twitter., bluesky via brid.gy or Mastodon via brid.gy. Change REFIX for your table prefix. This version uses the source URL in the comment meta for more accurate data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| SELECT | |
| SUBSTRING_INDEX(REPLACE(SUBSTRING_INDEX(`PREFIX_commentmeta`.`meta_value`,'//',-1),'www.',''),'/',1) AS domain, | |
| COUNT(`d8_comments`.`comment_ID`) AS counted | |
| FROM | |
| `REFIX_comments` LEFT OUTER JOIN `REFIX_commentmeta` ON `REFIX_comments`.`comment_ID` = `REFIX_commentmeta`.`comment_id` | |
| WHERE | |
| `PREFIX_comments`.`comment_type`='Mention' | |
| AND `REFIX_comments`.`comment_author_url` NOT LIKE '%twitter.com%' | |
| AND `REFIX_comments`.`comment_author_url` NOT LIKE '%brid.gy%' | |
| AND `REFIX_commentmeta`.`meta_value` NOT LIKE '%brid.gy%' | |
| AND `REFIX_commentmeta`.`meta_key` = 'webmention_source_url' | |
| GROUP BY domain | |
| ORDER BY counted DESC | |
| LIMIT 500; |
Author
Your instinct about needing to join the meta table twice is correct. Since version 5.0.0, the plugin no longer stores all webmentions with a single comment_type. Instead, each webmention gets a semantic type (mention, like, repost, bookmark, comment, etc.), and the protocol meta key (set to webmention) is what identifies them as webmentions.
So comment_type = 'Mention' in your query won't work — it would miss most webmentions on a current install.
Here's a corrected version:
SELECT
SUBSTRING_INDEX(
REPLACE(SUBSTRING_INDEX(src.meta_value, '//', -1), 'www.', ''),
'/', 1
) AS domain,
COUNT(c.comment_ID) AS counted
FROM wp_comments c
INNER JOIN wp_commentmeta proto
ON c.comment_ID = proto.comment_id
AND proto.meta_key = 'protocol'
AND proto.meta_value = 'webmention'
INNER JOIN wp_commentmeta src
ON c.comment_ID = src.comment_id
AND src.meta_key = 'webmention_source_url'
WHERE
c.comment_author_url NOT LIKE '%twitter.com%'
AND c.comment_author_url NOT LIKE '%brid.gy%'
AND c.comment_author_url NOT LIKE '%bsky.app%'
GROUP BY domain
ORDER BY counted DESC
LIMIT 500;The main changes:
- Uses
protocol = 'webmention'to find webmentions (the stable, canonical way) - Two meta joins: one for
protocol, one forwebmention_source_url— this is the right approach - Filters on
comment_author_urlrather than source URL for excluding bridged content — brid.gy sets the author URL to the original platform (twitter.com, etc.), so this catches the actual origin INNER JOINinstead ofLEFT OUTER JOINsince we only want comments that have these meta values
The protocol and webmention_source_url meta keys are stable and the intended way to query webmentions — no deprecation concerns there.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The substring bits were inspired by this thread: https://stackoverflow.com/questions/9280336/mysql-query-to-extract-domains-from-urls