Skip to content

Instantly share code, notes, and snippets.

@helxsz
Last active August 29, 2015 14:14
Show Gist options
  • Save helxsz/aa5ad9f72acab59940ea to your computer and use it in GitHub Desktop.
Save helxsz/aa5ad9f72acab59940ea to your computer and use it in GitHub Desktop.
Micals, 31
Jimy,21
Convolution, 53
issue, 25
in, 52
Caffe, 76
//https://github.com/apache/spark/blob/master/examples/src/main/python/sql.py
import sys
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import Row, StructField, StructType, StringType, IntegerType
sc = SparkContext("max_temperature")
sqlContext = SQLContext(sc)
lines = sc.textFile("file:///home/sizhexi/people.txt")
parts = lines.map(lambda l:l.split(","))
people = parts.map(lambda p: Row(name=p[0],age=int(p[1])))
peopleTable = sqlContext.inferSchema(people)
peopleTable.registerAsTable("people")
max_temperature_per_year = sqlContext.sql("SELECT year, MAX(temperature) FROM temperature_data GROUP BY year")
//json
people = sqlContext.jsonFile(path)
people.printSchema()
people.registerTempTable("people")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment