Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save natematias/10311557 to your computer and use it in GitHub Desktop.

Select an option

Save natematias/10311557 to your computer and use it in GitHub Desktop.
import mediacloud
import json
mc = mediacloud.api.MediaCloud('')
# query mentions of civic hacking
res = mc.sentenceList('( hacking AND civic ) OR ( hackathon AND civic)', '+publish_date:[2013-01-01T00:00:00Z TO 2014-04-19T00:00:00Z] AND +media_sets_id:1')
#print "found " + res['response']['numFound'] + " sentences"
story_ids = []
[story_ids.append(i) for i in [y["stories_id"] for y in res["response"]["docs"]] if not story_ids.count(i)]
stories = [mc.story(i) for i in story_ids]
print json.dumps(stories)
@rahulbot
Copy link

# this grabs 100 stories at a time
mc.storyList('( hacking AND civic ) OR ( hackathon AND civic)', '+publish_date:[2013-01-01T00:00:00Z TO 2014-04-19T00:00:00Z] AND +media_sets_id:1',0,100)
# and then page through the results by replacing the 0 in argument 3 with the biggest "processed_stories_id" you have in your local list/db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment