Last active
September 26, 2017 13:01
-
-
Save btashton/725396ed3b65b7ddd221 to your computer and use it in GitHub Desktop.
pyspark csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark import SparkContext | |
from pyspark.sql import SQLContext | |
from pyspark.sql.types import * | |
from IPython.display import display | |
sc = SparkContext(appName="CarCSV") | |
sqlContext = SQLContext(sc) | |
schema = StructType([StructField("year", IntegerType(), False), | |
StructField("make", StringType(), False), | |
StructField("model", StringType(), False), | |
StructField("comment", StringType(), False), | |
StructField("blank", StringType(), False)]) | |
df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "cars.csv", schema=schema) | |
summary = df.describe().collect() | |
sc.stop() | |
display(summary) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bashton@localhost ~/ihme/csvtest $ IPYTHON=1 ../spark/bin/pyspark carcsv.py --packages com.databricks:spark-csv_2.10:1.0.3 | |
WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0. | |
Use ./bin/spark-submit <python file> | |
Ivy Default Cache set to: /home/bashton/.ivy2/cache | |
The jars for the packages stored in: /home/bashton/.ivy2/jars | |
:: loading settings :: url = jar:file:/home/bashton/ihme/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml | |
com.databricks#spark-csv_2.10 added as a dependency | |
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 | |
confs: [default] | |
found com.databricks#spark-csv_2.10;1.0.3 in central | |
found org.apache.commons#commons-csv;1.1 in central | |
:: resolution report :: resolve 239ms :: artifacts dl 13ms | |
:: modules in use: | |
com.databricks#spark-csv_2.10;1.0.3 from central in [default] | |
org.apache.commons#commons-csv;1.1 from central in [default] | |
--------------------------------------------------------------------- | |
| | modules || artifacts | | |
| conf | number| search|dwnlded|evicted|| number|dwnlded| | |
--------------------------------------------------------------------- | |
| default | 2 | 0 | 0 | 0 || 2 | 0 | | |
--------------------------------------------------------------------- | |
:: retrieving :: org.apache.spark#spark-submit-parent | |
confs: [default] | |
0 artifacts copied, 2 already retrieved (0kB/18ms) | |
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties | |
15/06/26 14:30:40 INFO SparkContext: Running Spark version 1.3.1 | |
15/06/26 14:30:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
15/06/26 14:30:40 INFO SecurityManager: Changing view acls to: bashton | |
15/06/26 14:30:40 INFO SecurityManager: Changing modify acls to: bashton | |
15/06/26 14:30:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bashton); users with modify permissions: Set(bashton) | |
15/06/26 14:30:41 INFO Slf4jLogger: Slf4jLogger started | |
15/06/26 14:30:41 INFO Remoting: Starting remoting | |
15/06/26 14:30:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:58039] | |
15/06/26 14:30:41 INFO Utils: Successfully started service 'sparkDriver' on port 58039. | |
15/06/26 14:30:41 INFO SparkEnv: Registering MapOutputTracker | |
15/06/26 14:30:41 INFO SparkEnv: Registering BlockManagerMaster | |
15/06/26 14:30:41 INFO DiskBlockManager: Created local directory at /tmp/spark-bda9fa72-7b41-4aae-998a-ecaa6ff20849/blockmgr-a7447e52-3a58-4116-8ae6-6fb65cd1c0b7 | |
15/06/26 14:30:41 INFO MemoryStore: MemoryStore started with capacity 265.1 MB | |
15/06/26 14:30:41 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d6b055fc-b599-496f-b333-c8a9286e1550/httpd-03d46ced-5598-4d63-a234-5028cad9524b | |
15/06/26 14:30:41 INFO HttpServer: Starting HTTP Server | |
15/06/26 14:30:41 INFO Server: jetty-8.y.z-SNAPSHOT | |
15/06/26 14:30:41 INFO AbstractConnector: Started [email protected]:55627 | |
15/06/26 14:30:41 INFO Utils: Successfully started service 'HTTP file server' on port 55627. | |
15/06/26 14:30:41 INFO SparkEnv: Registering OutputCommitCoordinator | |
15/06/26 14:30:41 INFO Server: jetty-8.y.z-SNAPSHOT | |
15/06/26 14:30:41 INFO AbstractConnector: Started [email protected]:4040 | |
15/06/26 14:30:41 INFO Utils: Successfully started service 'SparkUI' on port 4040. | |
15/06/26 14:30:41 INFO SparkUI: Started SparkUI at http://localhost.localdomain:4040 | |
15/06/26 14:30:41 INFO SparkContext: Added JAR file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar at http://127.0.0.1:55627/jars/spark-csv_2.10.jar with timestamp 1435354241950 | |
15/06/26 14:30:41 INFO SparkContext: Added JAR file:/home/bashton/.ivy2/jars/commons-csv.jar at http://127.0.0.1:55627/jars/commons-csv.jar with timestamp 1435354241951 | |
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/ihme/csvtest/carcsv.py to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/carcsv.py | |
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/ihme/csvtest/carcsv.py at file:/home/bashton/ihme/csvtest/carcsv.py with timestamp 1435354242064 | |
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/.ivy2/jars/spark-csv_2.10.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar | |
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar at file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar with timestamp 1435354242071 | |
15/06/26 14:30:42 INFO Utils: Copying /home/bashton/.ivy2/jars/commons-csv.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar | |
15/06/26 14:30:42 INFO SparkContext: Added file file:/home/bashton/.ivy2/jars/commons-csv.jar at file:/home/bashton/.ivy2/jars/commons-csv.jar with timestamp 1435354242073 | |
15/06/26 14:30:42 INFO Executor: Starting executor ID <driver> on host localhost | |
15/06/26 14:30:42 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:58039/user/HeartbeatReceiver | |
15/06/26 14:30:42 INFO NettyBlockTransferService: Server created on 48414 | |
15/06/26 14:30:42 INFO BlockManagerMaster: Trying to register BlockManager | |
15/06/26 14:30:42 INFO BlockManagerMasterActor: Registering block manager localhost:48414 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 48414) | |
15/06/26 14:30:42 INFO BlockManagerMaster: Registered BlockManager | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(243853) called with curMem=0, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 238.1 KB, free 264.9 MB) | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(36168) called with curMem=243853, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 35.3 KB, free 264.9 MB) | |
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:48414 (size: 35.3 KB, free: 265.1 MB) | |
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 | |
15/06/26 14:30:43 INFO SparkContext: Created broadcast 0 from textFile at CsvRelation.scala:57 | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(243901) called with curMem=280021, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 238.2 KB, free 264.6 MB) | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(36168) called with curMem=523922, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 35.3 KB, free 264.6 MB) | |
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:48414 (size: 35.3 KB, free: 265.1 MB) | |
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 | |
15/06/26 14:30:43 INFO SparkContext: Created broadcast 1 from textFile at CsvRelation.scala:114 | |
15/06/26 14:30:43 INFO FileInputFormat: Total input paths to process : 1 | |
15/06/26 14:30:43 INFO SparkContext: Starting job: first at CsvRelation.scala:114 | |
15/06/26 14:30:43 INFO DAGScheduler: Got job 0 (first at CsvRelation.scala:114) with 1 output partitions (allowLocal=true) | |
15/06/26 14:30:43 INFO DAGScheduler: Final stage: Stage 0(first at CsvRelation.scala:114) | |
15/06/26 14:30:43 INFO DAGScheduler: Parents of final stage: List() | |
15/06/26 14:30:43 INFO DAGScheduler: Missing parents: List() | |
15/06/26 14:30:43 INFO DAGScheduler: Submitting Stage 0 (cars.csv MapPartitionsRDD[3] at textFile at CsvRelation.scala:114), which has no missing parents | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(2656) called with curMem=560090, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 264.6 MB) | |
15/06/26 14:30:43 INFO MemoryStore: ensureFreeSpace(1945) called with curMem=562746, maxMem=278019440 | |
15/06/26 14:30:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1945.0 B, free 264.6 MB) | |
15/06/26 14:30:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:48414 (size: 1945.0 B, free: 265.1 MB) | |
15/06/26 14:30:43 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 | |
15/06/26 14:30:43 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839 | |
15/06/26 14:30:43 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (cars.csv MapPartitionsRDD[3] at textFile at CsvRelation.scala:114) | |
15/06/26 14:30:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks | |
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1577 bytes) | |
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) | |
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/.ivy2/jars/spark-csv_2.10.jar with timestamp 1435354242071 | |
15/06/26 14:30:44 INFO Utils: /home/bashton/.ivy2/jars/spark-csv_2.10.jar has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar | |
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/ihme/csvtest/carcsv.py with timestamp 1435354242064 | |
15/06/26 14:30:44 INFO Utils: /home/bashton/ihme/csvtest/carcsv.py has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/carcsv.py | |
15/06/26 14:30:44 INFO Executor: Fetching file:/home/bashton/.ivy2/jars/commons-csv.jar with timestamp 1435354242073 | |
15/06/26 14:30:44 INFO Utils: /home/bashton/.ivy2/jars/commons-csv.jar has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar | |
15/06/26 14:30:44 INFO Executor: Fetching http://127.0.0.1:55627/jars/commons-csv.jar with timestamp 1435354241951 | |
15/06/26 14:30:44 INFO Utils: Fetching http://127.0.0.1:55627/jars/commons-csv.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp442241263986286587.tmp | |
15/06/26 14:30:44 INFO Utils: /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp442241263986286587.tmp has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar | |
15/06/26 14:30:44 INFO Executor: Adding file:/tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/commons-csv.jar to class loader | |
15/06/26 14:30:44 INFO Executor: Fetching http://127.0.0.1:55627/jars/spark-csv_2.10.jar with timestamp 1435354241950 | |
15/06/26 14:30:44 INFO Utils: Fetching http://127.0.0.1:55627/jars/spark-csv_2.10.jar to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp2119703671790968073.tmp | |
15/06/26 14:30:44 INFO Utils: /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/fetchFileTemp2119703671790968073.tmp has been previously copied to /tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar | |
15/06/26 14:30:44 INFO Executor: Adding file:/tmp/spark-89b3aaeb-660a-4cce-b662-86e009dc98c8/userFiles-a1bbaba7-b895-40e4-9911-1be7028284c2/spark-csv_2.10.jar to class loader | |
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:0+67 | |
15/06/26 14:30:44 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
15/06/26 14:30:44 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
15/06/26 14:30:44 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
15/06/26 14:30:44 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition | |
15/06/26 14:30:44 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id | |
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1824 bytes result sent to driver | |
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 358 ms on localhost (1/1) | |
15/06/26 14:30:44 INFO DAGScheduler: Stage 0 (first at CsvRelation.scala:114) finished in 0.384 s | |
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
15/06/26 14:30:44 INFO DAGScheduler: Job 0 finished: first at CsvRelation.scala:114, took 0.455945 s | |
15/06/26 14:30:44 INFO SparkContext: Starting job: runJob at SparkPlan.scala:122 | |
15/06/26 14:30:44 INFO FileInputFormat: Total input paths to process : 1 | |
15/06/26 14:30:44 INFO DAGScheduler: Registering RDD 7 (mapPartitions at Exchange.scala:101) | |
15/06/26 14:30:44 INFO DAGScheduler: Got job 1 (runJob at SparkPlan.scala:122) with 1 output partitions (allowLocal=false) | |
15/06/26 14:30:44 INFO DAGScheduler: Final stage: Stage 2(runJob at SparkPlan.scala:122) | |
15/06/26 14:30:44 INFO DAGScheduler: Parents of final stage: List(Stage 1) | |
15/06/26 14:30:44 INFO DAGScheduler: Missing parents: List(Stage 1) | |
15/06/26 14:30:44 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[7] at mapPartitions at Exchange.scala:101), which has no missing parents | |
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(13032) called with curMem=564691, maxMem=278019440 | |
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.7 KB, free 264.6 MB) | |
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(8119) called with curMem=577723, maxMem=278019440 | |
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 7.9 KB, free 264.6 MB) | |
15/06/26 14:30:44 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:48414 (size: 7.9 KB, free: 265.1 MB) | |
15/06/26 14:30:44 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0 | |
15/06/26 14:30:44 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839 | |
15/06/26 14:30:44 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[7] at mapPartitions at Exchange.scala:101) | |
15/06/26 14:30:44 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks | |
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1566 bytes) | |
15/06/26 14:30:44 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1566 bytes) | |
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) | |
15/06/26 14:30:44 INFO Executor: Running task 1.0 in stage 1.0 (TID 2) | |
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:0+67 | |
15/06/26 14:30:44 INFO HadoopRDD: Input split: file:/home/bashton/ihme/csvtest/cars.csv:67+67 | |
15/06/26 14:30:44 WARN CsvRelation$: Ignoring empty line: | |
15/06/26 14:30:44 WARN CsvRelation$: Ignoring empty line: | |
15/06/26 14:30:44 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 2003 bytes result sent to driver | |
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2003 bytes result sent to driver | |
15/06/26 14:30:44 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 254 ms on localhost (1/2) | |
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 259 ms on localhost (2/2) | |
15/06/26 14:30:44 INFO DAGScheduler: Stage 1 (mapPartitions at Exchange.scala:101) finished in 0.259 s | |
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool | |
15/06/26 14:30:44 INFO DAGScheduler: looking for newly runnable stages | |
15/06/26 14:30:44 INFO DAGScheduler: running: Set() | |
15/06/26 14:30:44 INFO DAGScheduler: waiting: Set(Stage 2) | |
15/06/26 14:30:44 INFO DAGScheduler: failed: Set() | |
15/06/26 14:30:44 INFO DAGScheduler: Missing parents for Stage 2: List() | |
15/06/26 14:30:44 INFO DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[11] at map at SparkPlan.scala:97), which is now runnable | |
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(16888) called with curMem=585842, maxMem=278019440 | |
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 16.5 KB, free 264.6 MB) | |
15/06/26 14:30:44 INFO MemoryStore: ensureFreeSpace(10193) called with curMem=602730, maxMem=278019440 | |
15/06/26 14:30:44 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 10.0 KB, free 264.6 MB) | |
15/06/26 14:30:44 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:48414 (size: 10.0 KB, free: 265.1 MB) | |
15/06/26 14:30:44 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0 | |
15/06/26 14:30:44 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839 | |
15/06/26 14:30:44 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MapPartitionsRDD[11] at map at SparkPlan.scala:97) | |
15/06/26 14:30:44 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks | |
15/06/26 14:30:44 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, localhost, PROCESS_LOCAL, 1329 bytes) | |
15/06/26 14:30:44 INFO Executor: Running task 0.0 in stage 2.0 (TID 3) | |
15/06/26 14:30:44 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks | |
15/06/26 14:30:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms | |
15/06/26 14:30:44 INFO Executor: Finished task 0.0 in stage 2.0 (TID 3). 1242 bytes result sent to driver | |
15/06/26 14:30:44 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 3) in 89 ms on localhost (1/1) | |
15/06/26 14:30:44 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool | |
15/06/26 14:30:44 INFO DAGScheduler: Stage 2 (runJob at SparkPlan.scala:122) finished in 0.091 s | |
15/06/26 14:30:44 INFO DAGScheduler: Job 1 finished: runJob at SparkPlan.scala:122, took 0.420159 s | |
15/06/26 14:30:45 INFO SparkContext: Starting job: collect at /home/bashton/ihme/csvtest/carcsv.py:16 | |
15/06/26 14:30:45 INFO DAGScheduler: Got job 2 (collect at /home/bashton/ihme/csvtest/carcsv.py:16) with 4 output partitions (allowLocal=false) | |
15/06/26 14:30:45 INFO DAGScheduler: Final stage: Stage 3(collect at /home/bashton/ihme/csvtest/carcsv.py:16) | |
15/06/26 14:30:45 INFO DAGScheduler: Parents of final stage: List() | |
15/06/26 14:30:45 INFO DAGScheduler: Missing parents: List() | |
15/06/26 14:30:45 INFO DAGScheduler: Submitting Stage 3 (MapPartitionsRDD[15] at collect at /home/bashton/ihme/csvtest/carcsv.py:16), which has no missing parents | |
15/06/26 14:30:45 INFO MemoryStore: ensureFreeSpace(2992) called with curMem=612923, maxMem=278019440 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 2.9 KB, free 264.6 MB) | |
15/06/26 14:30:45 INFO MemoryStore: ensureFreeSpace(2036) called with curMem=615915, maxMem=278019440 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2036.0 B, free 264.6 MB) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:48414 (size: 2036.0 B, free: 265.0 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0 | |
15/06/26 14:30:45 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:839 | |
15/06/26 14:30:45 INFO DAGScheduler: Submitting 4 missing tasks from Stage 3 (MapPartitionsRDD[15] at collect at /home/bashton/ihme/csvtest/carcsv.py:16) | |
15/06/26 14:30:45 INFO TaskSchedulerImpl: Adding task set 3.0 with 4 tasks | |
15/06/26 14:30:45 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, localhost, PROCESS_LOCAL, 1752 bytes) | |
15/06/26 14:30:45 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, localhost, PROCESS_LOCAL, 1753 bytes) | |
15/06/26 14:30:45 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 6, localhost, PROCESS_LOCAL, 1755 bytes) | |
15/06/26 14:30:45 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 7, localhost, PROCESS_LOCAL, 1781 bytes) | |
15/06/26 14:30:45 INFO Executor: Running task 0.0 in stage 3.0 (TID 4) | |
15/06/26 14:30:45 INFO Executor: Running task 1.0 in stage 3.0 (TID 5) | |
15/06/26 14:30:45 INFO Executor: Running task 2.0 in stage 3.0 (TID 6) | |
15/06/26 14:30:45 INFO Executor: Running task 3.0 in stage 3.0 (TID 7) | |
15/06/26 14:30:45 INFO Executor: Finished task 1.0 in stage 3.0 (TID 5). 656 bytes result sent to driver | |
15/06/26 14:30:45 INFO Executor: Finished task 0.0 in stage 3.0 (TID 4). 650 bytes result sent to driver | |
15/06/26 14:30:45 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 45 ms on localhost (1/4) | |
15/06/26 14:30:45 INFO Executor: Finished task 2.0 in stage 3.0 (TID 6). 658 bytes result sent to driver | |
15/06/26 14:30:45 INFO Executor: Finished task 3.0 in stage 3.0 (TID 7). 681 bytes result sent to driver | |
15/06/26 14:30:45 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 6) in 45 ms on localhost (2/4) | |
15/06/26 14:30:45 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 49 ms on localhost (3/4) | |
15/06/26 14:30:45 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 7) in 45 ms on localhost (4/4) | |
15/06/26 14:30:45 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
15/06/26 14:30:45 INFO DAGScheduler: Stage 3 (collect at /home/bashton/ihme/csvtest/carcsv.py:16) finished in 0.050 s | |
15/06/26 14:30:45 INFO DAGScheduler: Job 2 finished: collect at /home/bashton/ihme/csvtest/carcsv.py:16, took 0.066277 s | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 0 | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_0_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_0_piece0 of size 36168 dropped from memory (free 277437657) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:48414 in memory (size: 35.3 KB, free: 265.1 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_0 of size 243853 dropped from memory (free 277681510) | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 0 | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 5 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_5_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5_piece0 of size 2036 dropped from memory (free 277683546) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_5_piece0 on localhost:48414 in memory (size: 2036.0 B, free: 265.1 MB) | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} | |
15/06/26 14:30:45 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_5 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_5 of size 2992 dropped from memory (free 277686538) | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 5 | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 4 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_4_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_4_piece0 of size 10193 dropped from memory (free 277696731) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_4_piece0 on localhost:48414 in memory (size: 10.0 KB, free: 265.1 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_4 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_4 of size 16888 dropped from memory (free 277713619) | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 4 | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 3 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_3_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_3_piece0 of size 8119 dropped from memory (free 277721738) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_3_piece0 on localhost:48414 in memory (size: 7.9 KB, free: 265.1 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_3 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_3 of size 13032 dropped from memory (free 277734770) | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 3 | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned shuffle 0 | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 2 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_2_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_2_piece0 of size 1945 dropped from memory (free 277736715) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:48414 in memory (size: 1945.0 B, free: 265.1 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_2 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_2 of size 2656 dropped from memory (free 277739371) | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 2 | |
15/06/26 14:30:45 INFO BlockManager: Removing broadcast 1 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_1_piece0 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_1_piece0 of size 36168 dropped from memory (free 277775539) | |
15/06/26 14:30:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:48414 in memory (size: 35.3 KB, free: 265.1 MB) | |
15/06/26 14:30:45 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 | |
15/06/26 14:30:45 INFO BlockManager: Removing block broadcast_1 | |
15/06/26 14:30:45 INFO MemoryStore: Block broadcast_1 of size 243901 dropped from memory (free 278019440) | |
15/06/26 14:30:45 INFO ContextCleaner: Cleaned broadcast 1 | |
15/06/26 14:30:45 INFO SparkUI: Stopped Spark web UI at http://localhost.localdomain:4040 | |
15/06/26 14:30:45 INFO DAGScheduler: Stopping DAGScheduler | |
15/06/26 14:30:45 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! | |
15/06/26 14:30:45 INFO MemoryStore: MemoryStore cleared | |
15/06/26 14:30:45 INFO BlockManager: BlockManager stopped | |
15/06/26 14:30:45 INFO BlockManagerMaster: BlockManagerMaster stopped | |
15/06/26 14:30:45 INFO SparkContext: Successfully stopped SparkContext | |
15/06/26 14:30:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! | |
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. | |
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. | |
15/06/26 14:30:45 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. | |
[Row(summary=u'count', year=3), | |
Row(summary=u'mean', year=2008.0), | |
Row(summary=u'stddev', year=7.874007874011811), | |
Row(summary=u'min', year=1997), | |
Row(summary=u'max', year=2015)] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment