Skip to content

Instantly share code, notes, and snippets.

@vladbatushkov
Created May 16, 2019 17:26
Show Gist options
  • Save vladbatushkov/b5bc4e84b55996177d90b48d98cfb67a to your computer and use it in GitHub Desktop.
Save vladbatushkov/b5bc4e84b55996177d90b48d98cfb67a to your computer and use it in GitHub Desktop.
Parse Oscar movies.
WITH "https://www.hollywoodreporter.com/lists/oscar-nominations-2019-complete-list-nominees-1172407/item/best-director-1172474" as url
CALL apoc.load.html(url, { movie: "ul.list--unordered__items li div.list-item__body a" }) YIELD value
WITH collect(value) as data
WITH { movies: data[0].movie } as root
WITH [x IN range(0, length(root.movies) - 1) | { title: root.movies[x].text, url: root.movies[x].attributes.href }] as result
WITH filter(x IN result WHERE x.title <> "" AND apoc.text.indexOf(x.url, "/review/") > -1) as titled
UNWIND titled as item
MERGE (m:Movie { title: item.title })
ON CREATE SET m.url = apoc.text.replace(item.url, "edit.", "www.")
ON MATCH SET m.url = apoc.text.replace(item.url, "edit.", "www.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment