Last active
April 24, 2017 22:36
-
-
Save farhanpatel/eae0d901c539f3dd816fc11b9368f227 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT *, metadataDB.page_metadata.title AS metadata_title | |
FROM | |
( | |
SELECT * , id as historyID, max(length(url)), | |
(SELECT count(1) FROM visits WHERE visits.siteID = history.id) as visitCount | |
FROM history WHERE id IN ( | |
SELECT DISTINCT siteID AS historyID | |
FROM visits WHERE date > 1491243277100158 | |
AND NOT EXISTS (SELECT * FROM visits WHERE siteID = historyID AND date > 1493055877100158) | |
) | |
AND domain_id NOT IN (SELECT cached_top_sites.domain_id FROM cached_top_sites) | |
AND title NOT NULL AND title != '' | |
AND domain_id NOT IN (SELECT domains.id FROM domains WHERE domains.domain IN (?,?)) | |
group by domain_id | |
order by visitCount | |
limit 100 | |
) | |
LEFT JOIN view_history_id_favicon ON view_history_id_favicon.id = historyID | |
LEFT OUTER JOIN metadataDB.page_metadata ON metadataDB.page_metadata.site_url = url | |
ORDER BY COALESCE(iconURL, media_url) NOT NULL DESC /* Favour sites with some metadata */ | |
LIMIT 8 |
I'm actually suggesting doing the opposite of what desktop does. Desktop grabs everything into memory and does the work there. We aren't well-placed to do that — the query is more expensive for us, and we have less memory.
But yes, figure out what you can do that's better than the current query, and measure against large real-world data sets. If it's quicker and has lower peak memory use, and the query is clearer, there's no reason not to ship it.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Totally agree. I think trying to grab as much data as possible before making a decision on what to drop is better. This is how desktop does it. It fetches 500 history items and then ranks them all based on metadata/url/title as one step before pruning the list.
But (and I know this is a cop out) I wanted this PR to just make things slightly better than before. The role of highlights and its use is still being tested so for the MVP I don't want to spend too much more time on this.
At this point as a baseline, it'll be helpful to compare against whatever we come up next.