Today we're building Instagram image scraper.
Instagram has an API to poll for recent media to get media's attributes, including URL and location. Your program will be sending that information to a pre-cooked web UI for display.
Please login into Instagram, go to https://instagram.com/developer/ and create an app (Manage Client > Register a New Client) to obtain Client ID that is required to call Instagram's API. You may put http://rxdisplay.neueda.lv/ into Website and http://rxdisplay.neueda.lv/oauth into OAuth redirect uri. Any other URL-s would do too.
Join workshop chat at https://gitter.im/arkadijs/reactive-workshop to receive updates, code snippets, and brag about your accomplishments. After each step a solution will be posted to get you back on track, just in case.
Use HTTP client to request JSON from https://api.instagram.com/v1/tags/$tag/media/recent?client_id=$client_id&count=10
Tags API, then parse it and start an Observable stream. The stream should contains image URL and location coordinates, if any. Search the web for popular tags. Use Observable's interval()
, from()
, just()
, create()
methods and an HTTP client of your choice. There is a good chance you might enjoy from(Future|Promise)
call.
Use subscribe()
to print the data.
Instagram limit is 5000 API requests per hour per client id or access token.
Depending on the what your approach is - from()
or just()
, do you use Future/Promise or not - you may end up with Observable of Media or Observable of List of Media. You need Observable of Media for next step.
Try flatMap()
instead of map()
. Try Observable.merge()
(flatten).
Optional sidetrack: It is essential to understand how to construct Observables from scratch. Having Observable of List of Media to play with is perfect opportunity. Try Observable.create()
and/or (Replay)Subject
to bridge the list into single-item Observable.
We have UI ready for you at http://rxdisplay.neueda.lv/. POST to http://rxdisplay.neueda.lv/in a JSON like the following:
{
"tag":"tbt",
"url":"https://scontent.cdninstagram.com/....jpg",
"location":{
"latitude":51.504976275,
"longitude":-0.087847965,
"id":225481160,
"name":"The Shard London"
},
"participant":"change-me"
}
Send 150x150px thumbnail URL. location
is optional. participant
is to distinguish your feed on projector's screen. You can debug your personal feed by opening http://rxdisplay.neueda.lv/?participant=change-me.
Instagram's /media/recent query may return images already pulled in previous run. Also, image may have multiple tags. Apply Observable.distinct()
to filter out duplicates. Verify the filtering works. Split the stream by issuing multiple subscriptions, then count elements of deduplicated and unfiltered streams.
Use count()
(size). If numbers doesn't match - check replay()
and observeOn()
.
Create Smooth operator, like Sample and Debounce, but (1) no event loss and (2) internal adaptive trigger that track incoming rate and smoothly adapts outgoing rate to keep it steady:
123........45......6790123.......4 =>
1...2...3...4..5...6.7.8.9.0.1.2.3.4
- Main.scala - the origin of starters. Features
ReplaySubject
andobserveOn()
. - Starter.scala - stripped down version.
- instarx.groovy - simpler language, async http fetch with Gpars.
- starter.groovy
- JavaMain.java
- ratpack.groovy - Instagram real-time Tags and Geographies subscription via Groovy and Ratpack https://instagram.com/developer/realtime/
- Instagram API
- Observable
- Operators
- Subject
- Scheduler
- Supported languages
In case your development environment of choice is not with you today, or you want to try ReactiveX on unfamiliar platform, it may be beneficial to use Cloud IDE. Here is the list.