Created
October 25, 2011 03:24
-
-
Save DavidMikeSimon/1311207 to your computer and use it in GitHub Desktop.
MySQL trigram-search magic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| mysql> describe preloaded_trigrams; | |
| +----------------------+-----------+------+-----+---------+-------+ | |
| | Field | Type | Null | Key | Default | Extra | | |
| +----------------------+-----------+------+-----+---------+-------+ | |
| | token | binary(3) | NO | PRI | NULL | | | |
| | fuzzy_search_type_id | int(11) | NO | PRI | NULL | | | |
| | rec_id | int(11) | NO | PRI | NULL | | | |
| +----------------------+-----------+------+-----+---------+-------+ | |
| 3 rows in set (0.00 sec) | |
| mysql> show create table preloaded_trigrams; | |
| +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |
| | Table | Create Table | | |
| +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |
| | preloaded_trigrams | CREATE TABLE `preloaded_trigrams` ( | |
| `token` binary(3) NOT NULL, | |
| `fuzzy_search_type_id` int(11) NOT NULL, | |
| `rec_id` int(11) NOT NULL, | |
| PRIMARY KEY (`fuzzy_search_type_id`,`token`,`rec_id`) | |
| ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | | |
| +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |
| 1 row in set (0.00 sec) | |
| mysql> SELECT count(*), rec_id FROM `preloaded_trigrams` | |
| WHERE ( | |
| `preloaded_trigrams`.token IN ('loc','ock','ck ','blo',' bl', ' bu', 'bur', 'urn') | |
| AND `preloaded_trigrams`.fuzzy_search_type_id = 1 | |
| ) | |
| GROUP by rec_id | |
| ORDER BY count(*) DESC | |
| limit 20; | |
| +----------+--------+ | |
| | count(*) | rec_id | | |
| +----------+--------+ | |
| | 8 | 663171 | | |
| | 8 | 11880 | | |
| | 7 | 816005 | | |
| | 6 | 952912 | | |
| | 6 | 4386 | | |
| | 6 | 27571 | | |
| | 6 | 229238 | | |
| | 5 | 321422 | | |
| | 5 | 325732 | | |
| | 5 | 689548 | | |
| | 5 | 692608 | | |
| | 5 | 355328 | | |
| | 5 | 33508 | | |
| | 5 | 709801 | | |
| | 5 | 41529 | | |
| | 5 | 44614 | | |
| | 5 | 734761 | | |
| | 5 | 408453 | | |
| | 5 | 746625 | | |
| | 5 | 411458 | | |
| +----------+--------+ | |
| 20 rows in set (0.08 sec) | |
| mysql> explain SELECT count(*), rec_id FROM `preloaded_trigrams` WHERE (`preloaded_trigrams`.token IN ('loc','ock','ck ','blo',' bl', ' bu', 'bur', 'urn') AND `preloaded_trigrams`.fuzzy_search_type_id = 1) GROUP by rec_id ORDER BY count(*) DESC limit 20; | |
| +----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+ | |
| | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | | |
| +----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+ | |
| | 1 | SIMPLE | preloaded_trigrams | range | PRIMARY | PRIMARY | 7 | NULL | 67554 | | |
| Using where; Using index; Using temporary; Using filesort | | |
| +----+-------------+--------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+ | |
| 1 row in set (0.01 sec) | |
| #### Ruby code to generate query | |
| search_result = connection.select_rows( | |
| "SELECT rec_id, count(*) FROM #{i(table_name)} " + | |
| "WHERE token IN (#{trigrams.map{|t| v(t)}.join(',')}) " + | |
| "AND fuzzy_search_type_id = #{type.send(:fuzzy_type_id)} " + | |
| "GROUP by rec_id " + | |
| "ORDER BY count(*) DESC " + | |
| "LIMIT #{type.send(:fuzzy_search_limit)}" | |
| ) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment