Skip to content

Instantly share code, notes, and snippets.

@ryancutter
Created May 15, 2016 21:18
Show Gist options
  • Save ryancutter/003f9ec3b3ddca3f5253a6d7408941a4 to your computer and use it in GitHub Desktop.
Save ryancutter/003f9ec3b3ddca3f5253a6d7408941a4 to your computer and use it in GitHub Desktop.
# issue movies query
conf = {"es.resource" : "movies/logs", "es.query" : "?q=name:bourne"}
movies = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat",\
"org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)
# place results in table
moviesRows = movies.map(lambda p: Row(id=int(p[1]['id']), name=p[1]['name']))
moviesRowsList = moviesRows.collect()
schemaMovies = sqlContext.createDataFrame(moviesRowsList)
schemaMovies.registerTempTable("movies")
sqlContext.cacheTable("movies")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment