Last active
August 29, 2015 14:05
-
-
Save petervandenabeele/b9b394cd0dceaf81e140 to your computer and use it in GitHub Desktop.
Spark without and with .persist
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Java 1.7 | |
# Scala 2.4 | |
# Spark 1.0.2 | |
UPDATE (2014-08-19 01:00 am): | |
➜ conf git:(master) head -10 spark-env.sh | |
#!/usr/bin/env bash | |
# https://spark-project.atlassian.net/browse/SPARK-1264 | |
export SPARK_MASTER_MEMORY=4g | |
export SPARK_WORKER_MEMORY=4g | |
export SPARK_EXECUTOR_MEMORY=4g | |
export SPARK_DRIVER_MEMORY=4g | |
# This file is sourced when running various Spark programs. | |
... | |
Now the process does not longer throw OutOfMemory | |
After bw_1G.persist it runs at full force (700% CPU) "forever" (waited 15 minutes) | |
The log in spark-shell: | |
➜ spark-1.0.2-bin-hadoop2 ./bin/spark-shell | |
Spark assembly has been built with Hive, including Datanucleus jars on classpath | |
14/08/19 00:42:04 INFO spark.SecurityManager: Changing view acls to: peter_v | |
14/08/19 00:42:04 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(peter_v) | |
14/08/19 00:42:04 INFO spark.HttpServer: Starting HTTP Server | |
14/08/19 00:42:04 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/19 00:42:04 INFO server.AbstractConnector: Started [email protected]:57067 | |
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 1.0.2 | |
/_/ | |
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
14/08/19 00:42:08 INFO spark.SecurityManager: Changing view acls to: peter_v | |
14/08/19 00:42:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(peter_v) | |
14/08/19 00:42:08 INFO slf4j.Slf4jLogger: Slf4jLogger started | |
14/08/19 00:42:08 INFO Remoting: Starting remoting | |
14/08/19 00:42:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:57069] | |
14/08/19 00:42:08 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:57069] | |
14/08/19 00:42:08 INFO spark.SparkEnv: Registering MapOutputTracker | |
14/08/19 00:42:08 INFO spark.SparkEnv: Registering BlockManagerMaster | |
14/08/19 00:42:08 INFO storage.DiskBlockManager: Created local directory at /var/folders/1q/3_rsfwqd4b93sj7m6rnbzj8h0000gn/T/spark-local-20140819004208-4b9a | |
14/08/19 00:42:08 INFO storage.MemoryStore: MemoryStore started with capacity 2.3 GB. | |
14/08/19 00:42:08 INFO network.ConnectionManager: Bound socket to port 57070 with id = ConnectionManagerId(192.168.0.191,57070) | |
14/08/19 00:42:08 INFO storage.BlockManagerMaster: Trying to register BlockManager | |
14/08/19 00:42:08 INFO storage.BlockManagerInfo: Registering block manager 192.168.0.191:57070 with 2.3 GB RAM | |
14/08/19 00:42:08 INFO storage.BlockManagerMaster: Registered BlockManager | |
14/08/19 00:42:08 INFO spark.HttpServer: Starting HTTP Server | |
14/08/19 00:42:08 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/19 00:42:08 INFO server.AbstractConnector: Started [email protected]:57071 | |
14/08/19 00:42:08 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.0.191:57071 | |
14/08/19 00:42:08 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/1q/3_rsfwqd4b93sj7m6rnbzj8h0000gn/T/spark-8ba51b76-1898-4c48-86b8-b95445c3a1a3 | |
14/08/19 00:42:08 INFO spark.HttpServer: Starting HTTP Server | |
14/08/19 00:42:08 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/19 00:42:08 INFO server.AbstractConnector: Started [email protected]:57072 | |
14/08/19 00:42:08 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/19 00:42:08 INFO server.AbstractConnector: Started [email protected]:4040 | |
14/08/19 00:42:08 INFO ui.SparkUI: Started SparkUI at http://192.168.0.191:4040 | |
2014-08-19 00:42:09.106 java[7249:1903] Unable to load realm info from SCDynamicStore | |
14/08/19 00:42:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
14/08/19 00:42:09 INFO executor.Executor: Using REPL class URI: http://192.168.0.191:57067 | |
14/08/19 00:42:09 INFO repl.SparkILoop: Created spark context.. | |
Spark context available as sc. | |
scala> val bw_1G = sc.textFile("hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G") | |
14/08/19 00:42:17 INFO storage.MemoryStore: ensureFreeSpace(138763) called with curMem=0, maxMem=2470025625 | |
14/08/19 00:42:17 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 135.5 KB, free 2.3 GB) | |
bw_1G: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 | |
scala> bw_1G.count | |
14/08/19 00:42:24 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
14/08/19 00:42:25 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at <console>:15) | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/19 00:42:25 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 0 (MappedRDD[1] at textFile at <console>:12) | |
14/08/19 00:42:25 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 8 tasks | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1773 bytes in 2 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1773 bytes in 1 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as 1773 bytes in 0 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 3 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 1773 bytes in 0 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:4 as TID 4 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:4 as 1773 bytes in 1 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:5 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:5 as 1773 bytes in 0 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:6 as TID 6 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:6 as 1773 bytes in 0 ms | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Starting task 0.0:7 as TID 7 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:42:25 INFO scheduler.TaskSetManager: Serialized task 0.0:7 as 1773 bytes in 0 ms | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 2 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 0 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 3 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 1 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 4 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 5 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 6 | |
14/08/19 00:42:25 INFO executor.Executor: Running task ID 7 | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/19 00:42:25 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/19 00:42:25 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/19 00:42:47 INFO executor.Executor: Serialized size of result for 7 is 597 | |
14/08/19 00:42:47 INFO executor.Executor: Sending result for 7 directly to driver | |
14/08/19 00:42:47 INFO executor.Executor: Finished task ID 7 | |
14/08/19 00:42:47 INFO scheduler.TaskSetManager: Finished TID 7 in 22604 ms on localhost (progress: 1/8) | |
14/08/19 00:42:47 INFO scheduler.DAGScheduler: Completed ResultTask(0, 7) | |
14/08/19 00:43:00 INFO executor.Executor: Serialized size of result for 6 is 597 | |
14/08/19 00:43:00 INFO executor.Executor: Sending result for 6 directly to driver | |
14/08/19 00:43:00 INFO executor.Executor: Finished task ID 6 | |
14/08/19 00:43:00 INFO scheduler.DAGScheduler: Completed ResultTask(0, 6) | |
14/08/19 00:43:00 INFO scheduler.TaskSetManager: Finished TID 6 in 35518 ms on localhost (progress: 2/8) | |
14/08/19 00:43:00 INFO executor.Executor: Serialized size of result for 5 is 597 | |
14/08/19 00:43:00 INFO executor.Executor: Sending result for 5 directly to driver | |
14/08/19 00:43:00 INFO executor.Executor: Finished task ID 5 | |
14/08/19 00:43:00 INFO scheduler.DAGScheduler: Completed ResultTask(0, 5) | |
14/08/19 00:43:00 INFO scheduler.TaskSetManager: Finished TID 5 in 35894 ms on localhost (progress: 3/8) | |
14/08/19 00:43:01 INFO executor.Executor: Serialized size of result for 1 is 597 | |
14/08/19 00:43:01 INFO executor.Executor: Sending result for 1 directly to driver | |
14/08/19 00:43:01 INFO executor.Executor: Finished task ID 1 | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1) | |
14/08/19 00:43:01 INFO scheduler.TaskSetManager: Finished TID 1 in 36071 ms on localhost (progress: 4/8) | |
14/08/19 00:43:01 INFO executor.Executor: Serialized size of result for 4 is 597 | |
14/08/19 00:43:01 INFO executor.Executor: Sending result for 4 directly to driver | |
14/08/19 00:43:01 INFO executor.Executor: Finished task ID 4 | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 4) | |
14/08/19 00:43:01 INFO scheduler.TaskSetManager: Finished TID 4 in 36279 ms on localhost (progress: 5/8) | |
14/08/19 00:43:01 INFO executor.Executor: Serialized size of result for 2 is 597 | |
14/08/19 00:43:01 INFO executor.Executor: Sending result for 2 directly to driver | |
14/08/19 00:43:01 INFO executor.Executor: Finished task ID 2 | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 2) | |
14/08/19 00:43:01 INFO scheduler.TaskSetManager: Finished TID 2 in 36302 ms on localhost (progress: 6/8) | |
14/08/19 00:43:01 INFO executor.Executor: Serialized size of result for 3 is 597 | |
14/08/19 00:43:01 INFO executor.Executor: Sending result for 3 directly to driver | |
14/08/19 00:43:01 INFO executor.Executor: Finished task ID 3 | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 3) | |
14/08/19 00:43:01 INFO scheduler.TaskSetManager: Finished TID 3 in 36335 ms on localhost (progress: 7/8) | |
14/08/19 00:43:01 INFO executor.Executor: Serialized size of result for 0 is 597 | |
14/08/19 00:43:01 INFO executor.Executor: Sending result for 0 directly to driver | |
14/08/19 00:43:01 INFO executor.Executor: Finished task ID 0 | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0) | |
14/08/19 00:43:01 INFO scheduler.TaskSetManager: Finished TID 0 in 36427 ms on localhost (progress: 8/8) | |
14/08/19 00:43:01 INFO scheduler.DAGScheduler: Stage 0 (count at <console>:15) finished in 36.434 s | |
14/08/19 00:43:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
14/08/19 00:43:01 INFO spark.SparkContext: Job finished: count at <console>:15, took 36.518399 s | |
res0: Long = 94943296 | |
scala> bw_1G.count | |
14/08/19 00:43:04 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Got job 1 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at <console>:15) | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/19 00:43:04 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 1 (MappedRDD[1] at textFile at <console>:12) | |
14/08/19 00:43:04 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 8 tasks | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:2 as TID 10 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:2 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:3 as TID 11 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:3 as 1773 bytes in 1 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:4 as TID 12 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:4 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:5 as TID 13 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:5 as 1773 bytes in 1 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:6 as TID 14 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:6 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Starting task 1.0:7 as TID 15 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:04 INFO scheduler.TaskSetManager: Serialized task 1.0:7 as 1773 bytes in 0 ms | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 8 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 11 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 13 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 12 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 9 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 10 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 15 | |
14/08/19 00:43:04 INFO executor.Executor: Running task ID 14 | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/19 00:43:04 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/19 00:43:06 INFO executor.Executor: Serialized size of result for 15 is 597 | |
14/08/19 00:43:06 INFO executor.Executor: Sending result for 15 directly to driver | |
14/08/19 00:43:06 INFO executor.Executor: Finished task ID 15 | |
14/08/19 00:43:06 INFO scheduler.DAGScheduler: Completed ResultTask(1, 7) | |
14/08/19 00:43:06 INFO scheduler.TaskSetManager: Finished TID 15 in 2252 ms on localhost (progress: 1/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 14 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 14 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 14 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 6) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 14 in 4146 ms on localhost (progress: 2/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 8 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 8 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 8 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 0) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 8 in 4287 ms on localhost (progress: 3/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 10 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 10 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 10 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 2) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 10 in 4400 ms on localhost (progress: 4/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 13 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 13 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 13 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 5) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 13 in 4454 ms on localhost (progress: 5/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 9 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 9 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 9 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 9 in 4461 ms on localhost (progress: 6/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 12 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 12 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 12 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 4) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 12 in 4472 ms on localhost (progress: 7/8) | |
14/08/19 00:43:08 INFO executor.Executor: Serialized size of result for 11 is 597 | |
14/08/19 00:43:08 INFO executor.Executor: Sending result for 11 directly to driver | |
14/08/19 00:43:08 INFO executor.Executor: Finished task ID 11 | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Completed ResultTask(1, 3) | |
14/08/19 00:43:08 INFO scheduler.TaskSetManager: Finished TID 11 in 4488 ms on localhost (progress: 8/8) | |
14/08/19 00:43:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool | |
14/08/19 00:43:08 INFO scheduler.DAGScheduler: Stage 1 (count at <console>:15) finished in 4.491 s | |
14/08/19 00:43:08 INFO spark.SparkContext: Job finished: count at <console>:15, took 4.4975 s | |
res1: Long = 94943296 | |
scala> bw_1G.persist | |
res2: bw_1G.type = MappedRDD[1] at textFile at <console>:12 | |
scala> bw_1G.count | |
14/08/19 00:43:15 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Got job 2 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Final stage: Stage 2(count at <console>:15) | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/19 00:43:15 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 2 (MappedRDD[1] at textFile at <console>:12) | |
14/08/19 00:43:15 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 8 tasks | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 16 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 17 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:2 as TID 18 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:2 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:3 as TID 19 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:3 as 1776 bytes in 1 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:4 as TID 20 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:4 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:5 as TID 21 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:5 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:6 as TID 22 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:6 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Starting task 2.0:7 as TID 23 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/19 00:43:15 INFO scheduler.TaskSetManager: Serialized task 2.0:7 as 1776 bytes in 0 ms | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 16 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 17 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 22 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 19 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 18 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 21 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 20 | |
14/08/19 00:43:15 INFO executor.Executor: Running task ID 23 | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_7 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_3 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_1 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_2 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_0 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_5 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_6 not found, computing it | |
14/08/19 00:43:15 INFO spark.CacheManager: Partition rdd_1_4 not found, computing it | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/19 00:43:15 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
(it is now 2014-08-19 01:08 CEST and unable to stop the process with Ctrl-C ...) | |
These are the running processes the match "spark" and "dfs" | |
$ ps aux | grep spar | |
peter_v 7249 654.2 26.8 7238008 4492436 s017 R+ 12:42AM 182:51.19 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -cp ::/Users/peter_v/data/projects/spark/code_1.0.2/spark-1.0.2-bin-hadoop2/conf:/Users/peter_v/data/projects/spark/code_1.0.2/spark-1.0.2-bin-hadoop2/lib/spark-assembly-1.0.2-hadoop2.2.0.jar:/Users/peter_v/data/projects/spark/code_1.0.2/spark-1.0.2-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/Users/peter_v/data/projects/spark/code_1.0.2/spark-1.0.2-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/Users/peter_v/data/projects/spark/code_1.0.2/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main | |
peter_v 7242 0.0 0.0 2435412 716 s017 S+ 12:42AM 0:00.01 bash ./bin/spark-shell | |
peter_v 7744 0.0 0.0 2423368 232 s001 R+ 1:09AM 0:00.00 grep spar | |
$ ps aux | grep dfs | |
peter_v 6773 0.0 0.4 3878160 63492 ?? S 12:31AM 0:13.39 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -Dproc_nodemanager -Xmx1000m -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dyarn.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=yarn-peter_v-nodemanager-Peters-MacBook-Pro-2.local.log -Dyarn.log.file=yarn-peter_v-nodemanager-Peters-MacBook-Pro-2.local.log -Dyarn.home.dir= -Dyarn.id.str=peter_v -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dyarn.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=yarn-peter_v-nodemanager-Peters-MacBook-Pro-2.local.log -Dyarn.log.file=yarn-peter_v-nodemanager-Peters-MacBook-Pro-2.local.log -Dyarn.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -classpath /usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager | |
peter_v 6679 0.0 0.5 4029908 91880 s021 S 12:31AM 0:23.40 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -Dproc_resourcemanager -Xmx1000m -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dyarn.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=yarn-peter_v-resourcemanager-Peters-MacBook-Pro-2.local.log -Dyarn.log.file=yarn-peter_v-resourcemanager-Peters-MacBook-Pro-2.local.log -Dyarn.home.dir= -Dyarn.id.str=peter_v -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dyarn.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=yarn-peter_v-resourcemanager-Peters-MacBook-Pro-2.local.log -Dyarn.log.file=yarn-peter_v-resourcemanager-Peters-MacBook-Pro-2.local.log -Dyarn.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -classpath /usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/common/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/hdfs/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/mapreduce/*:/usr/local/Cellar/hadoop/2.4.1/contrib/capacity-scheduler/*.jar:/usr/local/Cellar/hadoop/2.4.1/contrib/capacity-scheduler/*.jar:/usr/local/Cellar/hadoop/2.4.1/contrib/capacity-scheduler/*.jar:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/*:/usr/local/Cellar/hadoop/2.4.1/libexec/share/hadoop/yarn/lib/*:/usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager | |
peter_v 6555 0.0 0.2 3832576 38172 ?? S 12:31AM 0:06.74 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=hadoop-peter_v-secondarynamenode-Peters-MacBook-Pro-2.local.log -Dhadoop.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.id.str=peter_v -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode | |
peter_v 6447 0.0 1.0 3842896 167008 ?? S 12:30AM 0:15.04 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -Dproc_datanode -Xmx1000m -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=hadoop-peter_v-datanode-Peters-MacBook-Pro-2.local.log -Dhadoop.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.id.str=peter_v -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode | |
peter_v 6360 0.0 1.1 3846940 182788 ?? S 12:30AM 0:10.48 /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java -Dproc_namenode -Xmx1000m -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dhadoop.log.dir=/usr/local/Cellar/hadoop/2.4.1/libexec/logs -Dhadoop.log.file=hadoop-peter_v-namenode-Peters-MacBook-Pro-2.local.log -Dhadoop.home.dir=/usr/local/Cellar/hadoop/2.4.1/libexec -Dhadoop.id.str=peter_v -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode | |
peter_v 7755 0.0 0.0 2423368 188 s001 R+ 1:09AM 0:00.00 grep dfs | |
➜ spark-1.0.2-bin-hadoop2 cat conf/spark-defaults.conf | |
# Default system properties included when running spark-submit. | |
# This is useful for setting default environmental settings. | |
# Example: | |
# spark.master spark://master:7077 | |
# spark.eventLog.enabled true | |
# spark.eventLog.dir hdfs://namenode:8021/directory | |
# spark.serializer org.apache.spark.serializer.KryoSerializer | |
# peter_v | |
spark.executor.memory 4G | |
OLDER result (from 2014-08-18) | |
➜ spark-1.0.2-bin-hadoop2 ./bin/spark-shell | |
Spark assembly has been built with Hive, including Datanucleus jars on classpath | |
14/08/18 17:37:06 INFO spark.SecurityManager: Changing view acls to: peter_v | |
14/08/18 17:37:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(peter_v) | |
14/08/18 17:37:06 INFO spark.HttpServer: Starting HTTP Server | |
14/08/18 17:37:06 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/18 17:37:06 INFO server.AbstractConnector: Started [email protected]:50660 | |
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 1.0.2 | |
/_/ | |
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
14/08/18 17:37:15 INFO spark.SecurityManager: Changing view acls to: peter_v | |
14/08/18 17:37:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(peter_v) | |
14/08/18 17:37:17 INFO slf4j.Slf4jLogger: Slf4jLogger started | |
14/08/18 17:37:17 INFO Remoting: Starting remoting | |
14/08/18 17:37:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:50662] | |
14/08/18 17:37:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:50662] | |
14/08/18 17:37:18 INFO spark.SparkEnv: Registering MapOutputTracker | |
14/08/18 17:37:18 INFO spark.SparkEnv: Registering BlockManagerMaster | |
14/08/18 17:37:18 INFO storage.DiskBlockManager: Created local directory at /var/folders/1q/3_rsfwqd4b93sj7m6rnbzj8h0000gn/T/spark-local-20140818173718-1058 | |
14/08/18 17:37:18 INFO storage.MemoryStore: MemoryStore started with capacity 294.9 MB. | |
14/08/18 17:37:18 INFO network.ConnectionManager: Bound socket to port 50663 with id = ConnectionManagerId(172.20.10.8,50663) | |
14/08/18 17:37:18 INFO storage.BlockManagerMaster: Trying to register BlockManager | |
14/08/18 17:37:18 INFO storage.BlockManagerInfo: Registering block manager 172.20.10.8:50663 with 294.9 MB RAM | |
14/08/18 17:37:18 INFO storage.BlockManagerMaster: Registered BlockManager | |
14/08/18 17:37:18 INFO spark.HttpServer: Starting HTTP Server | |
14/08/18 17:37:18 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/18 17:37:18 INFO server.AbstractConnector: Started [email protected]:50664 | |
14/08/18 17:37:18 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.20.10.8:50664 | |
14/08/18 17:37:18 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/1q/3_rsfwqd4b93sj7m6rnbzj8h0000gn/T/spark-f25b49b0-3d4b-402a-a8e1-079c0ff689ec | |
14/08/18 17:37:18 INFO spark.HttpServer: Starting HTTP Server | |
14/08/18 17:37:18 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/18 17:37:18 INFO server.AbstractConnector: Started [email protected]:50665 | |
14/08/18 17:37:19 INFO server.Server: jetty-8.y.z-SNAPSHOT | |
14/08/18 17:37:19 INFO server.AbstractConnector: Started [email protected]:4040 | |
14/08/18 17:37:19 INFO ui.SparkUI: Started SparkUI at http://172.20.10.8:4040 | |
2014-08-18 17:37:20.318 java[4307:1903] Unable to load realm info from SCDynamicStore | |
14/08/18 17:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
14/08/18 17:37:21 INFO executor.Executor: Using REPL class URI: http://172.20.10.8:50660 | |
14/08/18 17:37:22 INFO repl.SparkILoop: Created spark context.. | |
Spark context available as sc. | |
scala> val bw_1G = sc.textFile("hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G") | |
14/08/18 17:37:30 INFO storage.MemoryStore: ensureFreeSpace(138763) called with curMem=0, maxMem=309225062 | |
14/08/18 17:37:30 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 135.5 KB, free 294.8 MB) | |
bw_1G: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 | |
scala> bw_1G.count | |
14/08/18 17:37:39 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
14/08/18 17:37:40 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at <console>:15) | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/18 17:37:40 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 0 (MappedRDD[1] at textFile at <console>:12) | |
14/08/18 17:37:40 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 8 tasks | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1773 bytes in 2 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1773 bytes in 1 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as 1773 bytes in 1 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 3 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 1773 bytes in 1 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:4 as TID 4 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:4 as 1773 bytes in 1 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:5 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:5 as 1773 bytes in 0 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:6 as TID 6 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:6 as 1773 bytes in 0 ms | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Starting task 0.0:7 as TID 7 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:37:40 INFO scheduler.TaskSetManager: Serialized task 0.0:7 as 1773 bytes in 0 ms | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 0 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 2 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 4 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 1 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 3 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 6 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 5 | |
14/08/18 17:37:40 INFO executor.Executor: Running task ID 7 | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/18 17:37:40 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
14/08/18 17:37:40 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id | |
14/08/18 17:38:06 INFO executor.Executor: Serialized size of result for 7 is 597 | |
14/08/18 17:38:06 INFO executor.Executor: Sending result for 7 directly to driver | |
14/08/18 17:38:06 INFO executor.Executor: Finished task ID 7 | |
14/08/18 17:38:07 INFO scheduler.TaskSetManager: Finished TID 7 in 26296 ms on localhost (progress: 1/8) | |
14/08/18 17:38:07 INFO scheduler.DAGScheduler: Completed ResultTask(0, 7) | |
14/08/18 17:38:18 INFO executor.Executor: Serialized size of result for 1 is 597 | |
14/08/18 17:38:18 INFO executor.Executor: Sending result for 1 directly to driver | |
14/08/18 17:38:18 INFO executor.Executor: Finished task ID 1 | |
14/08/18 17:38:18 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1) | |
14/08/18 17:38:18 INFO scheduler.TaskSetManager: Finished TID 1 in 38058 ms on localhost (progress: 2/8) | |
14/08/18 17:38:18 INFO executor.Executor: Serialized size of result for 2 is 597 | |
14/08/18 17:38:18 INFO executor.Executor: Sending result for 2 directly to driver | |
14/08/18 17:38:18 INFO executor.Executor: Finished task ID 2 | |
14/08/18 17:38:18 INFO scheduler.DAGScheduler: Completed ResultTask(0, 2) | |
14/08/18 17:38:18 INFO scheduler.TaskSetManager: Finished TID 2 in 38230 ms on localhost (progress: 3/8) | |
14/08/18 17:38:18 INFO executor.Executor: Serialized size of result for 3 is 597 | |
14/08/18 17:38:18 INFO executor.Executor: Sending result for 3 directly to driver | |
14/08/18 17:38:18 INFO executor.Executor: Finished task ID 3 | |
14/08/18 17:38:18 INFO scheduler.DAGScheduler: Completed ResultTask(0, 3) | |
14/08/18 17:38:18 INFO scheduler.TaskSetManager: Finished TID 3 in 38238 ms on localhost (progress: 4/8) | |
14/08/18 17:38:18 INFO executor.Executor: Serialized size of result for 4 is 597 | |
14/08/18 17:38:18 INFO executor.Executor: Sending result for 4 directly to driver | |
14/08/18 17:38:18 INFO executor.Executor: Finished task ID 4 | |
14/08/18 17:38:18 INFO scheduler.DAGScheduler: Completed ResultTask(0, 4) | |
14/08/18 17:38:18 INFO scheduler.TaskSetManager: Finished TID 4 in 38327 ms on localhost (progress: 5/8) | |
14/08/18 17:38:19 INFO executor.Executor: Serialized size of result for 6 is 597 | |
14/08/18 17:38:19 INFO executor.Executor: Sending result for 6 directly to driver | |
14/08/18 17:38:19 INFO executor.Executor: Finished task ID 6 | |
14/08/18 17:38:19 INFO scheduler.DAGScheduler: Completed ResultTask(0, 6) | |
14/08/18 17:38:19 INFO scheduler.TaskSetManager: Finished TID 6 in 38469 ms on localhost (progress: 6/8) | |
14/08/18 17:38:19 INFO executor.Executor: Serialized size of result for 0 is 597 | |
14/08/18 17:38:19 INFO executor.Executor: Sending result for 0 directly to driver | |
14/08/18 17:38:19 INFO executor.Executor: Finished task ID 0 | |
14/08/18 17:38:19 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0) | |
14/08/18 17:38:19 INFO scheduler.TaskSetManager: Finished TID 0 in 38501 ms on localhost (progress: 7/8) | |
14/08/18 17:38:19 INFO executor.Executor: Serialized size of result for 5 is 597 | |
14/08/18 17:38:19 INFO executor.Executor: Sending result for 5 directly to driver | |
14/08/18 17:38:19 INFO executor.Executor: Finished task ID 5 | |
14/08/18 17:38:19 INFO scheduler.DAGScheduler: Completed ResultTask(0, 5) | |
14/08/18 17:38:19 INFO scheduler.TaskSetManager: Finished TID 5 in 38651 ms on localhost (progress: 8/8) | |
14/08/18 17:38:19 INFO scheduler.DAGScheduler: Stage 0 (count at <console>:15) finished in 38.725 s | |
14/08/18 17:38:19 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
14/08/18 17:38:19 INFO spark.SparkContext: Job finished: count at <console>:15, took 39.115727 s | |
res0: Long = 94943296 | |
scala> bw_1G.count | |
14/08/18 17:38:23 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Got job 1 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at <console>:15) | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/18 17:38:23 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 1 (MappedRDD[1] at textFile at <console>:12) | |
14/08/18 17:38:23 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 8 tasks | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:2 as TID 10 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:2 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:3 as TID 11 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:3 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:4 as TID 12 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:4 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:5 as TID 13 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:5 as 1773 bytes in 1 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:6 as TID 14 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:6 as 1773 bytes in 1 ms | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Starting task 1.0:7 as TID 15 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:23 INFO scheduler.TaskSetManager: Serialized task 1.0:7 as 1773 bytes in 0 ms | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 8 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 9 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 12 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 14 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 15 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 11 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 10 | |
14/08/18 17:38:23 INFO executor.Executor: Running task ID 13 | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/18 17:38:23 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/18 17:38:23 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/18 17:38:25 INFO executor.Executor: Serialized size of result for 15 is 597 | |
14/08/18 17:38:25 INFO executor.Executor: Sending result for 15 directly to driver | |
14/08/18 17:38:25 INFO executor.Executor: Finished task ID 15 | |
14/08/18 17:38:25 INFO scheduler.DAGScheduler: Completed ResultTask(1, 7) | |
14/08/18 17:38:25 INFO scheduler.TaskSetManager: Finished TID 15 in 2398 ms on localhost (progress: 1/8) | |
14/08/18 17:38:27 INFO executor.Executor: Serialized size of result for 13 is 597 | |
14/08/18 17:38:27 INFO executor.Executor: Sending result for 13 directly to driver | |
14/08/18 17:38:27 INFO executor.Executor: Finished task ID 13 | |
14/08/18 17:38:27 INFO scheduler.DAGScheduler: Completed ResultTask(1, 5) | |
14/08/18 17:38:27 INFO scheduler.TaskSetManager: Finished TID 13 in 4549 ms on localhost (progress: 2/8) | |
14/08/18 17:38:27 INFO executor.Executor: Serialized size of result for 8 is 597 | |
14/08/18 17:38:27 INFO executor.Executor: Sending result for 8 directly to driver | |
14/08/18 17:38:27 INFO executor.Executor: Finished task ID 8 | |
14/08/18 17:38:27 INFO scheduler.DAGScheduler: Completed ResultTask(1, 0) | |
14/08/18 17:38:27 INFO scheduler.TaskSetManager: Finished TID 8 in 4575 ms on localhost (progress: 3/8) | |
14/08/18 17:38:28 INFO executor.Executor: Serialized size of result for 9 is 597 | |
14/08/18 17:38:28 INFO executor.Executor: Sending result for 9 directly to driver | |
14/08/18 17:38:28 INFO executor.Executor: Finished task ID 9 | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) | |
14/08/18 17:38:28 INFO scheduler.TaskSetManager: Finished TID 9 in 4632 ms on localhost (progress: 4/8) | |
14/08/18 17:38:28 INFO executor.Executor: Serialized size of result for 11 is 597 | |
14/08/18 17:38:28 INFO executor.Executor: Sending result for 11 directly to driver | |
14/08/18 17:38:28 INFO executor.Executor: Finished task ID 11 | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Completed ResultTask(1, 3) | |
14/08/18 17:38:28 INFO scheduler.TaskSetManager: Finished TID 11 in 4635 ms on localhost (progress: 5/8) | |
14/08/18 17:38:28 INFO executor.Executor: Serialized size of result for 14 is 597 | |
14/08/18 17:38:28 INFO executor.Executor: Sending result for 14 directly to driver | |
14/08/18 17:38:28 INFO executor.Executor: Finished task ID 14 | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Completed ResultTask(1, 6) | |
14/08/18 17:38:28 INFO scheduler.TaskSetManager: Finished TID 14 in 4658 ms on localhost (progress: 6/8) | |
14/08/18 17:38:28 INFO executor.Executor: Serialized size of result for 10 is 597 | |
14/08/18 17:38:28 INFO executor.Executor: Sending result for 10 directly to driver | |
14/08/18 17:38:28 INFO executor.Executor: Finished task ID 10 | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Completed ResultTask(1, 2) | |
14/08/18 17:38:28 INFO scheduler.TaskSetManager: Finished TID 10 in 4692 ms on localhost (progress: 7/8) | |
14/08/18 17:38:28 INFO executor.Executor: Serialized size of result for 12 is 597 | |
14/08/18 17:38:28 INFO executor.Executor: Sending result for 12 directly to driver | |
14/08/18 17:38:28 INFO executor.Executor: Finished task ID 12 | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Completed ResultTask(1, 4) | |
14/08/18 17:38:28 INFO scheduler.TaskSetManager: Finished TID 12 in 4704 ms on localhost (progress: 8/8) | |
14/08/18 17:38:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool | |
14/08/18 17:38:28 INFO scheduler.DAGScheduler: Stage 1 (count at <console>:15) finished in 4.709 s | |
14/08/18 17:38:28 INFO spark.SparkContext: Job finished: count at <console>:15, took 4.715505 s | |
res1: Long = 94943296 | |
scala> bw_1G.count | |
14/08/18 17:38:31 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Got job 2 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Final stage: Stage 2(count at <console>:15) | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/18 17:38:31 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 2 (MappedRDD[1] at textFile at <console>:12) | |
14/08/18 17:38:31 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 8 tasks | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 16 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 17 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:2 as TID 18 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:2 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:3 as TID 19 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:3 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:4 as TID 20 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:4 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:5 as TID 21 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:5 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:6 as TID 22 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:6 as 1773 bytes in 0 ms | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Starting task 2.0:7 as TID 23 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:31 INFO scheduler.TaskSetManager: Serialized task 2.0:7 as 1773 bytes in 1 ms | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 16 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 17 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 21 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 19 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 18 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 20 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 23 | |
14/08/18 17:38:31 INFO executor.Executor: Running task ID 22 | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/18 17:38:31 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/18 17:38:31 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/18 17:38:34 INFO executor.Executor: Serialized size of result for 23 is 597 | |
14/08/18 17:38:34 INFO executor.Executor: Sending result for 23 directly to driver | |
14/08/18 17:38:34 INFO executor.Executor: Finished task ID 23 | |
14/08/18 17:38:34 INFO scheduler.DAGScheduler: Completed ResultTask(2, 7) | |
14/08/18 17:38:34 INFO scheduler.TaskSetManager: Finished TID 23 in 2272 ms on localhost (progress: 1/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 22 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 22 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 22 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 6) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 22 in 4264 ms on localhost (progress: 2/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 16 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 16 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 16 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 0) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 16 in 4489 ms on localhost (progress: 3/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 20 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 20 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 20 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 4) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 20 in 4569 ms on localhost (progress: 4/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 21 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 21 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 21 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 5) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 21 in 4600 ms on localhost (progress: 5/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 19 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 19 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 19 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 3) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 19 in 4691 ms on localhost (progress: 6/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 18 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 18 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 18 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 2) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 18 in 4738 ms on localhost (progress: 7/8) | |
14/08/18 17:38:36 INFO executor.Executor: Serialized size of result for 17 is 597 | |
14/08/18 17:38:36 INFO executor.Executor: Sending result for 17 directly to driver | |
14/08/18 17:38:36 INFO executor.Executor: Finished task ID 17 | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Completed ResultTask(2, 1) | |
14/08/18 17:38:36 INFO scheduler.TaskSetManager: Finished TID 17 in 4807 ms on localhost (progress: 8/8) | |
14/08/18 17:38:36 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool | |
14/08/18 17:38:36 INFO scheduler.DAGScheduler: Stage 2 (count at <console>:15) finished in 4.808 s | |
14/08/18 17:38:36 INFO spark.SparkContext: Job finished: count at <console>:15, took 4.813768 s | |
res2: Long = 94943296 | |
scala> bw_1G.persist | |
res3: bw_1G.type = MappedRDD[1] at textFile at <console>:12 | |
scala> bw_1G.count | |
14/08/18 17:38:53 INFO spark.SparkContext: Starting job: count at <console>:15 | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Got job 3 (count at <console>:15) with 8 output partitions (allowLocal=false) | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Final stage: Stage 3(count at <console>:15) | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Parents of final stage: List() | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Missing parents: List() | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[1] at textFile at <console>:12), which has no missing parents | |
14/08/18 17:38:53 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 3 (MappedRDD[1] at textFile at <console>:12) | |
14/08/18 17:38:53 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 8 tasks | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:0 as TID 24 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:0 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:1 as TID 25 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:1 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:2 as TID 26 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:2 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:3 as TID 27 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:3 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:4 as TID 28 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:4 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:5 as TID 29 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:5 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:6 as TID 30 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:6 as 1776 bytes in 1 ms | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Starting task 3.0:7 as TID 31 on executor localhost: localhost (PROCESS_LOCAL) | |
14/08/18 17:38:53 INFO scheduler.TaskSetManager: Serialized task 3.0:7 as 1776 bytes in 0 ms | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 24 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 25 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 29 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 26 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 27 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 28 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 31 | |
14/08/18 17:38:53 INFO executor.Executor: Running task ID 30 | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO storage.BlockManager: Found block broadcast_0 locally | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_0 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_1 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_2 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_7 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_4 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_5 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_6 not found, computing it | |
14/08/18 17:38:53 INFO spark.CacheManager: Partition rdd_1_3 not found, computing it | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:805306368+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:671088640+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:536870912+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:939524096+63925754 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:268435456+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:134217728+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:0+134217728 | |
14/08/18 17:38:53 INFO rdd.HadoopRDD: Input split: hdfs://Peters-MacBook-Pro-2.local:9000/user/peter_v/big_words_1G:402653184+134217728 | |
14/08/18 17:39:19 ERROR executor.Executor: Exception in task ID 24 | |
java.lang.OutOfMemoryError: GC overhead limit exceeded | |
at java.nio.ByteBuffer.wrap(ByteBuffer.java:369) | |
at org.apache.hadoop.io.Text.decode(Text.java:382) | |
at org.apache.hadoop.io.Text.toString(Text.java:280) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:21 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-6,5,main] | |
java.lang.OutOfMemoryError: GC overhead limit exceeded | |
at java.nio.ByteBuffer.wrap(ByteBuffer.java:369) | |
at org.apache.hadoop.io.Text.decode(Text.java:382) | |
at org.apache.hadoop.io.Text.toString(Text.java:280) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:23 WARN scheduler.TaskSetManager: Lost TID 24 (task 3.0:0) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 25 | |
java.lang.OutOfMemoryError: GC overhead limit exceeded | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:206) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] | |
java.lang.OutOfMemoryError: GC overhead limit exceeded | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:206) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 WARN rdd.HadoopRDD: Exception in RecordReader.close() | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:588) | |
at java.io.FilterInputStream.close(FilterInputStream.java:181) | |
at org.apache.hadoop.util.LineReader.close(LineReader.java:150) | |
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211) | |
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) | |
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) | |
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) | |
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 31 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 30 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 27 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 26 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 28 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:34 ERROR executor.Executor: Exception in task ID 29 | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.OutOfMemoryError | |
java.lang.OutOfMemoryError: GC overhead limit exceeded | |
at java.nio.ByteBuffer.wrap(ByteBuffer.java:369) | |
at org.apache.hadoop.io.Text.decode(Text.java:382) | |
at org.apache.hadoop.io.Text.toString(Text.java:280) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:458) | |
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:35 ERROR scheduler.TaskSetManager: Task 3.0:0 failed 1 times; aborting job | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.lang.OutOfMemoryError: GC overhead limit exceeded [duplicate 1] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException | |
java.io.IOException: Filesystem closed | |
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) | |
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:735) | |
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793) | |
at java.io.DataInputStream.read(DataInputStream.java:100) | |
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) | |
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) | |
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201) | |
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184) | |
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) | |
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) | |
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) | |
at scala.collection.Iterator$class.foreach(Iterator.scala:727) | |
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) | |
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) | |
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) | |
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:107) | |
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) | |
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) | |
at org.apache.spark.scheduler.Task.run(Task.scala:51) | |
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) | |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) | |
at java.lang.Thread.run(Thread.java:744) | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Cancelling stage 3 | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: Filesystem closed [duplicate 1] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 INFO scheduler.DAGScheduler: Failed to run count at <console>:15 | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: Filesystem closed [duplicate 2] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: Filesystem closed [duplicate 3] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: Filesystem closed [duplicate 4] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
14/08/18 17:39:35 INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: Filesystem closed [duplicate 5] | |
14/08/18 17:39:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool | |
➜ spark-1.0.2-bin-hadoop2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment