Skip to content

Instantly share code, notes, and snippets.

@dvliman
Last active August 29, 2015 14:17
Show Gist options
  • Save dvliman/ae0d1123348c62ffe99b to your computer and use it in GitHub Desktop.
Save dvliman/ae0d1123348c62ffe99b to your computer and use it in GitHub Desktop.
interleave data stream
say you have data sources with pagination, where you iterate with page and per_page
http://host/articles?topic=action (page = [0,1,2,...] per_page = 10 | 20 | 30)
http://host/articles?topic=puzzle (page = [0,1,2,...] per_page = 10 | 20 | 30)
http://host/articles?topic=shooter (page = [0,1,2,...] per_page = 10 | 20 | 30)
http://host/articles?topic=rpg (page = [0,1,2,...] per_page = 10 | 20 | 30)
Each article is tagged one and one only for the 'topic'. And then, you
realize that some topics (action, puzzle, shooter, rpg) are actually 'games'
so you define game is a set topics [action, puzzle, shooter, and rpg)
Now how would you interleave specific topic (i.e. action, puzzle...)
for less specific topic (i.e. games)?
If you think this is mistagging issue. Well, it is not.
This is still a valid problem. What if you have data from twitter,
facebook, google, etc where you want to aggregate all of them
for a 'coarse' topic?
How do I have a 'fair share' from each data source while
honoring the main 'page' and 'per_page' contract?
From client perspective, I want to look like a 'single' data source
How do I interleave in a such way that you don't show
too much / too little of one data source?
What if the data source can not guarantee / satisfy your request in whole
For example, if you ask 20 items, it may just return you 18?
Would you fullfil the missing two from other data sources? if so, how?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment