Skip to content

Instantly share code, notes, and snippets.

@farhanpatel
Last active April 24, 2017 22:36
Show Gist options
  • Save farhanpatel/eae0d901c539f3dd816fc11b9368f227 to your computer and use it in GitHub Desktop.
Save farhanpatel/eae0d901c539f3dd816fc11b9368f227 to your computer and use it in GitHub Desktop.
SELECT *, metadataDB.page_metadata.title AS metadata_title
FROM
(
SELECT * , id as historyID, max(length(url)),
(SELECT count(1) FROM visits WHERE visits.siteID = history.id) as visitCount
FROM history WHERE id IN (
SELECT DISTINCT siteID AS historyID
FROM visits WHERE date > 1491243277100158
AND NOT EXISTS (SELECT * FROM visits WHERE siteID = historyID AND date > 1493055877100158)
)
AND domain_id NOT IN (SELECT cached_top_sites.domain_id FROM cached_top_sites)
AND title NOT NULL AND title != ''
AND domain_id NOT IN (SELECT domains.id FROM domains WHERE domains.domain IN (?,?))
group by domain_id
order by visitCount
limit 100
)
LEFT JOIN view_history_id_favicon ON view_history_id_favicon.id = historyID
LEFT OUTER JOIN metadataDB.page_metadata ON metadataDB.page_metadata.site_url = url
ORDER BY COALESCE(iconURL, media_url) NOT NULL DESC /* Favour sites with some metadata */
LIMIT 8
@farhanpatel
Copy link
Author

Totally agree. I think trying to grab as much data as possible before making a decision on what to drop is better. This is how desktop does it. It fetches 500 history items and then ranks them all based on metadata/url/title as one step before pruning the list.

But (and I know this is a cop out) I wanted this PR to just make things slightly better than before. The role of highlights and its use is still being tested so for the MVP I don't want to spend too much more time on this.

At this point as a baseline, it'll be helpful to compare against whatever we come up next.

@rnewman
Copy link

rnewman commented Apr 24, 2017

I'm actually suggesting doing the opposite of what desktop does. Desktop grabs everything into memory and does the work there. We aren't well-placed to do that — the query is more expensive for us, and we have less memory.

But yes, figure out what you can do that's better than the current query, and measure against large real-world data sets. If it's quicker and has lower peak memory use, and the query is clearer, there's no reason not to ship it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment