Skip to content

Instantly share code, notes, and snippets.

@DavidMikeSimon
Created October 25, 2011 03:24
Show Gist options
  • Select an option

  • Save DavidMikeSimon/1311207 to your computer and use it in GitHub Desktop.

Select an option

Save DavidMikeSimon/1311207 to your computer and use it in GitHub Desktop.
MySQL trigram-search magic
mysql> describe preloaded_trigrams;
+----------------------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+-----------+------+-----+---------+-------+
| token | binary(3) | NO | PRI | NULL | |
| fuzzy_search_type_id | int(11) | NO | PRI | NULL | |
| rec_id | int(11) | NO | PRI | NULL | |
+----------------------+-----------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> show create table preloaded_trigrams;
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| preloaded_trigrams | CREATE TABLE `preloaded_trigrams` (
`token` binary(3) NOT NULL,
`fuzzy_search_type_id` int(11) NOT NULL,
`rec_id` int(11) NOT NULL,
PRIMARY KEY (`fuzzy_search_type_id`,`token`,`rec_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT count(*), rec_id FROM `preloaded_trigrams`
WHERE (
`preloaded_trigrams`.token IN ('loc','ock','ck ','blo',' bl', ' bu', 'bur', 'urn')
AND `preloaded_trigrams`.fuzzy_search_type_id = 1
)
GROUP by rec_id
ORDER BY count(*) DESC
limit 20;
+----------+--------+
| count(*) | rec_id |
+----------+--------+
| 8 | 663171 |
| 8 | 11880 |
| 7 | 816005 |
| 6 | 952912 |
| 6 | 4386 |
| 6 | 27571 |
| 6 | 229238 |
| 5 | 321422 |
| 5 | 325732 |
| 5 | 689548 |
| 5 | 692608 |
| 5 | 355328 |
| 5 | 33508 |
| 5 | 709801 |
| 5 | 41529 |
| 5 | 44614 |
| 5 | 734761 |
| 5 | 408453 |
| 5 | 746625 |
| 5 | 411458 |
+----------+--------+
20 rows in set (0.08 sec)
mysql> explain SELECT count(*), rec_id FROM `preloaded_trigrams` WHERE (`preloaded_trigrams`.token IN ('loc','ock','ck ','blo',' bl', ' bu', 'bur', 'urn') AND `preloaded_trigrams`.fuzzy_search_type_id = 1) GROUP by rec_id ORDER BY count(*) DESC limit 20;
+----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | preloaded_trigrams | range | PRIMARY | PRIMARY | 7 | NULL | 67554 |
Using where; Using index; Using temporary; Using filesort |
+----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
1 row in set (0.01 sec)
#### Ruby code to generate query
search_result = connection.select_rows(
"SELECT rec_id, count(*) FROM #{i(table_name)} " +
"WHERE token IN (#{trigrams.map{|t| v(t)}.join(',')}) " +
"AND fuzzy_search_type_id = #{type.send(:fuzzy_type_id)} " +
"GROUP by rec_id " +
"ORDER BY count(*) DESC " +
"LIMIT #{type.send(:fuzzy_search_limit)}"
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment