Created
December 5, 2018 03:54
-
-
Save andykais/1b792de8ea2f416a199bd22c9b19573b to your computer and use it in GitHub Desktop.
separate scraper-step definitions from the downloading structure for readability, still will compile down to inline structure: https://gist.github.com/andykais/04a02b61bb3b6d92aa3388c45ea816bd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input: 'username' | |
defs: | |
- name: 'home' | |
download: 'https://ifunny.co/user/{{ username }}' | |
parse: | |
name: 'batch-id' | |
selector: '.stream__item:first-child' | |
attribute: 'data-next' | |
- name: 'gallery' | |
download: 'https://ifunny.co/user/{{ username }}/timeline/{{ value }}?batch=2?mode=grid' | |
- name: 'next-batch-id' | |
parse: | |
selector: '.stream__item:first-child' | |
attribute: 'data-next' | |
- name: 'batch-page' | |
parse: | |
selector: '.post a' | |
attribute: 'href' | |
- name: 'image-page' | |
download: 'https://ifunny.co{{ value }}' | |
parse: | |
selector: '.post .media__image' | |
attribute: 'src' | |
- name: 'image' | |
download: '{{ value }}' | |
structure: | |
def: 'home' | |
scrapeEach: | |
def: 'batch-id' | |
scrapeEach: | |
def: 'gallery' | |
scrapeNext: | |
def: 'next-batch-id' | |
scrapeEach: | |
def: 'image-page' | |
scrapeEach: | |
def: 'image' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment