Skip to content

Instantly share code, notes, and snippets.

@scottrice10
Created September 10, 2013 00:57
Show Gist options
  • Select an option

  • Save scottrice10/6503644 to your computer and use it in GitHub Desktop.

Select an option

Save scottrice10/6503644 to your computer and use it in GitHub Desktop.
This simple script extracts json from npiapi.com’s json web service. Npiapi.com has a limit of 100 documents per request, and requests are divided into separate “offsets,” i.e. pages. So to document 74,000 documents, it is necessary to make 740 requests to the offsets 0-739. While this script is localized to a particular web service, the generic…
#!/usr/bin/env python
import requests
import json
data = []
limit = 100
offset = 740
chunks = 74
def es_bulk():
n = 0
if ( offset % chunks != 0):
print "Error: Chunks must divide evenly into offset with no remainder."
else:
while (offset > n):
data = []
n = n + (offset/chunks)
for o in range(0, (offset/chunks)):
for l in range(limit):
url = "https://www.npiapi.com/api/providers?token=4a74e1440aaa5609f44d97d829c03c8e0b76fef9&format=json&state=MO&limit=%(l)s&offset=%(o)s" % {"l" : limit, "o" : o}
r = requests.get(url)
c = r.content
j = json.loads(c)
docs1 = '%s\n' % '{ "index" : { "_index" : "doctors_index", "_type" : "doctors"} }'
source = j['providers'][l]
docs2 = '%s\n' % json.dumps(source)
data.append(docs1 + docs2)
bulk = ''.join(data)
#print bulk
requests.post("localhost:9200/_bulk", data=bulk)
if __name__ == '__main__':
es_bulk()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment