Skip to content

Instantly share code, notes, and snippets.

@btashton
Last active September 26, 2017 13:01
Show Gist options
  • Save btashton/725396ed3b65b7ddd221 to your computer and use it in GitHub Desktop.
Save btashton/725396ed3b65b7ddd221 to your computer and use it in GitHub Desktop.
pyspark csv
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from IPython.display import display
sc = SparkContext(appName="CarCSV")
sqlContext = SQLContext(sc)
schema = StructType([StructField("year", IntegerType(), False),
StructField("make", StringType(), False),
StructField("model", StringType(), False),
StructField("comment", StringType(), False),
StructField("blank", StringType(), False)])
df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "cars.csv", schema=schema)
summary = df.describe().collect()
sc.stop()
display(summary)
bashton@localhost ~/ihme/csvtest $ IPYTHON=1 ../spark/bin/pyspark carcsv.py --packages com.databricks:spark-csv_2.10:1.0.3
WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0.
Use ./bin/spark-submit <python file>
Ivy Default Cache set to: /home/bashton/.ivy2/cache
The jars for the packages stored in: /home/bashton/.ivy2/jars
:: loading settings :: url = jar:file:/home/bashton/ihme/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.10;1.0.3 in central
found org.apache.commons#commons-csv;1.1 in central
:: resolution report :: resolve 239ms :: artifacts dl 13ms
:: modules in use:
com.databricks#spark-csv_2.10;1.0.3 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 2 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/18ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/26 14:30:40 INFO SparkContext: Running Spark version 1.3.1
15/06/26 14:30:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/26 14:30:40 INFO SecurityManager: Changing view acls to: bashton
15/06/26 14:30:40 INFO SecurityManager: Changing modify acls to: bashton
15/06/26 14:30:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bashton); users with modify permissions: Set(bashton)
15/06/26 14:30:41 INFO Slf4jLogger: Slf4jLogger started
15/06/26 14:30:41 INFO Remoting: Starting remoting
15/06/26 14:30:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:58039]
15/06/26 14:30:41 INFO Utils: Successfully started service 'sparkDriver' on port 58039.
15/06/26 14:30:41 INFO SparkEnv: Registering MapOutputTracker
15/06/26 14:30:41 INFO SparkEnv: Registering BlockManagerMaster
15/06/26 14:30:41 INFO DiskBlockManager: Created local directory at /tmp/spark-bda9fa72-7b41-4aae-998a-ecaa6ff20849/blockmgr-a7447e52-3a58-4116-8ae6-6fb65cd1c0b7
15/06/26 14:30:41 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/06/26 14:30:41 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d6b055fc-b599-496f-b333-c8a9286e1550/httpd-03d46ced-5598-4d63-a234-5028cad9524b
15/06/26 14:30:41 INFO HttpServer: Starting HTTP Server
15/06/26 14:30:41 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/26 14:30:41 INFO AbstractConnector: Started [email protected]:55627
15/06/26 14:30:41 INFO Utils: Successfully started service 'HTTP file server' on port 55627.
15/06/26 14:30:41 INFO SparkEnv: Registering OutputCommitCoordinator
15/06/26 14:30:41 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/26 14:30:41 INFO AbstractConnector: Started [email protected]:4040
15/06/26 14:30:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/06/26 14:30:41 INFO SparkUI: Started SparkUI at http://localhost.localdomain:4040
15/06/26 14:30:41 INFO SparkContext: Added JAR file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar at http://127.0.0.1:55627/jars/spark-csv_2.10.jar with timestamp 1435354241950
15/06/26 14:30:41 INFO SparkContext: Added JAR file:/home/bashton/.ivy2/jars/commons-csv.jar at http://127.0.0.1:55627/jars/commons-csv.jar with timestamp 1435354241951
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/ihme/csvtest/carcsv.py to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/carcsv.py
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/ihme/csvtest/carcsv.py at file:/home/bashton/ihme/csvtest/carcsv.py with timestamp 1435354242064
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/.ivy2/jars/spark-csv_2.10.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar at file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar with timestamp 1435354242071
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/.ivy2/jars/commons-csv.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/.ivy2/jars/commons-csv.jar at file:/home/bashton/.ivy2/jars/commons-csv.jar with timestamp 1435354242073
15/06/26 14:30:42 INFO Executor: Starting executor ID <driver> on host localhost
15/06/26 14:30:42 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:58039/user/HeartbeatReceiver
15/06/26 14:30:42 INFO NettyBlockTransferService: Server created on 48414
15/06/26 14:30:42 INFO BlockManagerMaster: Trying to register BlockManager
15/06/26 14:30:42 INFO BlockManagerMasterActor: Registering block manager localhost:48414 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 48414)
15/06/26 14:30:42 INFO BlockManagerMaster: Registered BlockManager
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(243853) called with curMem=0, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 238.1 KB, free 264.9 MB)
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(36168) called with curMem=243853, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 35.3 KB, free 264.9 MB)
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:48414 (size: 35.3 KB, free: 265.1 MB)
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/06/26 14:30:43 INFO SparkContext: Created broadcast 0 from textFile at CsvRelation.scala:57
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(243901) called with curMem=280021, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 238.2 KB, free 264.6 MB)
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(36168) called with curMem=523922, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 35.3 KB, free 264.6 MB)
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:48414 (size: 35.3 KB, free: 265.1 MB)
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/06/26 14:30:43 INFO SparkContext: Created broadcast 1 from textFile at CsvRelation.scala:114
15/06/26 14:30:43 INFO FileInputFormat: Total input paths to process : 1
15/06/26 14:30:43 INFO SparkContext: Starting job: first at CsvRelation.scala:114
15/06/26 14:30:43 INFO DAGScheduler: Got job 0 (first at CsvRelation.scala:114) with 1 output partitions (allowLocal=true)
15/06/26 14:30:43 INFO DAGScheduler: Final stage: Stage 0(first at CsvRelation.scala:114)
15/06/26 14:30:43 INFO DAGScheduler: Parents of final stage: List()
15/06/26 14:30:43 INFO DAGScheduler: Missing parents: List()
15/06/26 14:30:43 INFO DAGScheduler: Submitting Stage 0 (cars.csv MapPartitionsRDD[3] at textFile at CsvRelation.scala:114), which has no missing parents
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(2656) called with curMem=560090, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 264.6 MB)
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(1945) called with curMem=562746, maxMem=278019440
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1945.0 B, free 264.6 MB)
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:48414 (size: 1945.0 B, free: 265.1 MB)
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/06/26 14:30:43 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839
15/06/26 14:30:43 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (cars.csv MapPartitionsRDD[3] at textFile at CsvRelation.scala:114)
15/06/26 14:30:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1577 bytes)
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar with timestamp 1435354242071
15/06/26 14:30:44 INFO Utils: /home/bashton/.ivy2/jars/spark-csv_2.10.jar has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/ihme/csvtest/carcsv.py with timestamp 1435354242064
15/06/26 14:30:44 INFO Utils: /home/bashton/ihme/csvtest/carcsv.py has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/carcsv.py
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/.ivy2/jars/commons-csv.jar with timestamp 1435354242073
15/06/26 14:30:44 INFO Utils: /home/bashton/.ivy2/jars/commons-csv.jar has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar
15/06/26 14:30:44 INFO Executor: Fetching http://127.0.0.1:55627/jars/commons-csv.jar with timestamp 1435354241951
15/06/26 14:30:44 INFO Utils: Fetching http://127.0.0.1:55627/jars/commons-csv.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp442241263986286587.tmp
15/06/26 14:30:44 INFO Utils: /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp442241263986286587.tmp has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar
15/06/26 14:30:44 INFO Executor: Adding file:/tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar to class loader
15/06/26 14:30:44 INFO Executor: Fetching http://127.0.0.1:55627/jars/spark-csv_2.10.jar with timestamp 1435354241950
15/06/26 14:30:44 INFO Utils: Fetching http://127.0.0.1:55627/jars/spark-csv_2.10.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp2119703671790968073.tmp
15/06/26 14:30:44 INFO Utils: /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp2119703671790968073.tmp has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar
15/06/26 14:30:44 INFO Executor: Adding file:/tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar to class loader
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:0+67
15/06/26 14:30:44 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/06/26 14:30:44 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/06/26 14:30:44 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/06/26 14:30:44 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/06/26 14:30:44 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1824 bytes result sent to driver
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 358 ms on localhost (1/1)
15/06/26 14:30:44 INFO DAGScheduler: Stage 0 (first at CsvRelation.scala:114) finished in 0.384 s
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/06/26 14:30:44 INFO DAGScheduler: Job 0 finished: first at CsvRelation.scala:114, took 0.455945 s
15/06/26 14:30:44 INFO SparkContext: Starting job: runJob at SparkPlan.scala:122
15/06/26 14:30:44 INFO FileInputFormat: Total input paths to process : 1
15/06/26 14:30:44 INFO DAGScheduler: Registering RDD 7 (mapPartitions at Exchange.scala:101)
15/06/26 14:30:44 INFO DAGScheduler: Got job 1 (runJob at SparkPlan.scala:122) with 1 output partitions (allowLocal=false)
15/06/26 14:30:44 INFO DAGScheduler: Final stage: Stage 2(runJob at SparkPlan.scala:122)
15/06/26 14:30:44 INFO DAGScheduler: Parents of final stage: List(Stage 1)
15/06/26 14:30:44 INFO DAGScheduler: Missing parents: List(Stage 1)
15/06/26 14:30:44 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[7] at mapPartitions at Exchange.scala:101), which has no missing parents
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(13032) called with curMem=564691, maxMem=278019440
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.7 KB, free 264.6 MB)
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(8119) called with curMem=577723, maxMem=278019440
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 7.9 KB, free 264.6 MB)
15/06/26 14:30:44 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:48414 (size: 7.9 KB, free: 265.1 MB)
15/06/26 14:30:44 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/06/26 14:30:44 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/06/26 14:30:44 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[7] at mapPartitions at Exchange.scala:101)
15/06/26 14:30:44 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1566 bytes)
15/06/26 14:30:44 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1566 bytes)
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/06/26 14:30:44 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:0+67
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:67+67
15/06/26 14:30:44 WARN CsvRelation$: Ignoring empty line:
15/06/26 14:30:44 WARN CsvRelation$: Ignoring empty line:
15/06/26 14:30:44 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 2003 bytes result sent to driver
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2003 bytes result sent to driver
15/06/26 14:30:44 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 254 ms on localhost (1/2)
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 259 ms on localhost (2/2)
15/06/26 14:30:44 INFO DAGScheduler: Stage 1 (mapPartitions at Exchange.scala:101) finished in 0.259 s
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/06/26 14:30:44 INFO DAGScheduler: looking for newly runnable stages
15/06/26 14:30:44 INFO DAGScheduler: running: Set()
15/06/26 14:30:44 INFO DAGScheduler: waiting: Set(Stage 2)
15/06/26 14:30:44 INFO DAGScheduler: failed: Set()
15/06/26 14:30:44 INFO DAGScheduler: Missing parents for Stage 2: List()
15/06/26 14:30:44 INFO DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[11] at map at SparkPlan.scala:97), which is now runnable
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(16888) called with curMem=585842, maxMem=278019440
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 16.5 KB, free 264.6 MB)
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(10193) called with curMem=602730, maxMem=278019440
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 10.0 KB, free 264.6 MB)
15/06/26 14:30:44 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:48414 (size: 10.0 KB, free: 265.1 MB)
15/06/26 14:30:44 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/06/26 14:30:44 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839
15/06/26 14:30:44 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MapPartitionsRDD[11] at map at SparkPlan.scala:97)
15/06/26 14:30:44 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, localhost, PROCESS_LOCAL, 1329 bytes)
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 2.0 (TID 3)
15/06/26 14:30:44 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
15/06/26 14:30:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 2.0 (TID 3). 1242 bytes result sent to driver
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 3) in 89 ms on localhost (1/1)
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
15/06/26 14:30:44 INFO DAGScheduler: Stage 2 (runJob at SparkPlan.scala:122) finished in 0.091 s
15/06/26 14:30:44 INFO DAGScheduler: Job 1 finished: runJob at SparkPlan.scala:122, took 0.420159 s
15/06/26 14:30:45 INFO SparkContext: Starting job: collect at /home/bashton/ihme/csvtest/carcsv.py:16
15/06/26 14:30:45 INFO DAGScheduler: Got job 2 (collect at /home/bashton/ihme/csvtest/carcsv.py:16) with 4 output partitions (allowLocal=false)
15/06/26 14:30:45 INFO DAGScheduler: Final stage: Stage 3(collect at /home/bashton/ihme/csvtest/carcsv.py:16)
15/06/26 14:30:45 INFO DAGScheduler: Parents of final stage: List()
15/06/26 14:30:45 INFO DAGScheduler: Missing parents: List()
15/06/26 14:30:45 INFO DAGScheduler: Submitting Stage 3 (MapPartitionsRDD[15] at collect at /home/bashton/ihme/csvtest/carcsv.py:16), which has no missing parents
15/06/26 14:30:45 INFO MemoryStore: ensureFreeSpace(2992) called with curMem=612923, maxMem=278019440
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 2.9 KB, free 264.6 MB)
15/06/26 14:30:45 INFO MemoryStore: ensureFreeSpace(2036) called with curMem=615915, maxMem=278019440
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2036.0 B, free 264.6 MB)
15/06/26 14:30:45 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:48414 (size: 2036.0 B, free: 265.0 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0
15/06/26 14:30:45 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:839
15/06/26 14:30:45 INFO DAGScheduler: Submitting 4 missing tasks from Stage 3 (MapPartitionsRDD[15] at collect at /home/bashton/ihme/csvtest/carcsv.py:16)
15/06/26 14:30:45 INFO TaskSchedulerImpl: Adding task set 3.0 with 4 tasks
15/06/26 14:30:45 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, localhost, PROCESS_LOCAL, 1752 bytes)
15/06/26 14:30:45 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, localhost, PROCESS_LOCAL, 1753 bytes)
15/06/26 14:30:45 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 6, localhost, PROCESS_LOCAL, 1755 bytes)
15/06/26 14:30:45 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 7, localhost, PROCESS_LOCAL, 1781 bytes)
15/06/26 14:30:45 INFO Executor: Running task 0.0 in stage 3.0 (TID 4)
15/06/26 14:30:45 INFO Executor: Running task 1.0 in stage 3.0 (TID 5)
15/06/26 14:30:45 INFO Executor: Running task 2.0 in stage 3.0 (TID 6)
15/06/26 14:30:45 INFO Executor: Running task 3.0 in stage 3.0 (TID 7)
15/06/26 14:30:45 INFO Executor: Finished task 1.0 in stage 3.0 (TID 5). 656 bytes result sent to driver
15/06/26 14:30:45 INFO Executor: Finished task 0.0 in stage 3.0 (TID 4). 650 bytes result sent to driver
15/06/26 14:30:45 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 45 ms on localhost (1/4)
15/06/26 14:30:45 INFO Executor: Finished task 2.0 in stage 3.0 (TID 6). 658 bytes result sent to driver
15/06/26 14:30:45 INFO Executor: Finished task 3.0 in stage 3.0 (TID 7). 681 bytes result sent to driver
15/06/26 14:30:45 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 6) in 45 ms on localhost (2/4)
15/06/26 14:30:45 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 49 ms on localhost (3/4)
15/06/26 14:30:45 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 7) in 45 ms on localhost (4/4)
15/06/26 14:30:45 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
15/06/26 14:30:45 INFO DAGScheduler: Stage 3 (collect at /home/bashton/ihme/csvtest/carcsv.py:16) finished in 0.050 s
15/06/26 14:30:45 INFO DAGScheduler: Job 2 finished: collect at /home/bashton/ihme/csvtest/carcsv.py:16, took 0.066277 s
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 0
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_0_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_0_piece0 of size 36168 dropped from memory (free 277437657)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:48414 in memory (size: 35.3 KB, free: 265.1 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_0 of size 243853 dropped from memory (free 277681510)
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 0
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 5
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_5_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5_piece0 of size 2036 dropped from memory (free 277683546)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_5_piece0 on localhost:48414 in memory (size: 2036.0 B, free: 265.1 MB)
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_5
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5 of size 2992 dropped from memory (free 277686538)
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 5
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 4
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_4_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_4_piece0 of size 10193 dropped from memory (free 277696731)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_4_piece0 on localhost:48414 in memory (size: 10.0 KB, free: 265.1 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_4
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_4 of size 16888 dropped from memory (free 277713619)
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 4
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 3
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_3_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_3_piece0 of size 8119 dropped from memory (free 277721738)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_3_piece0 on localhost:48414 in memory (size: 7.9 KB, free: 265.1 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_3
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_3 of size 13032 dropped from memory (free 277734770)
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 3
15/06/26 14:30:45 INFO ContextCleaner: Cleaned shuffle 0
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 2
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_2_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_2_piece0 of size 1945 dropped from memory (free 277736715)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:48414 in memory (size: 1945.0 B, free: 265.1 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_2
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_2 of size 2656 dropped from memory (free 277739371)
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 2
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 1
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_1_piece0
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_1_piece0 of size 36168 dropped from memory (free 277775539)
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:48414 in memory (size: 35.3 KB, free: 265.1 MB)
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_1
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_1 of size 243901 dropped from memory (free 278019440)
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 1
15/06/26 14:30:45 INFO SparkUI: Stopped Spark web UI at http://localhost.localdomain:4040
15/06/26 14:30:45 INFO DAGScheduler: Stopping DAGScheduler
15/06/26 14:30:45 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/06/26 14:30:45 INFO MemoryStore: MemoryStore cleared
15/06/26 14:30:45 INFO BlockManager: BlockManager stopped
15/06/26 14:30:45 INFO BlockManagerMaster: BlockManagerMaster stopped
15/06/26 14:30:45 INFO SparkContext: Successfully stopped SparkContext
15/06/26 14:30:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
[Row(summary=u'count', year=3),
Row(summary=u'mean', year=2008.0),
Row(summary=u'stddev', year=7.874007874011811),
Row(summary=u'min', year=1997),
Row(summary=u'max', year=2015)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment