|
➜ dev SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-1.3.1-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g |
|
Ivy Default Cache set to: /Users/sim/.ivy2/cache |
|
The jars for the packages stored in: /Users/sim/.ivy2/jars |
|
:: loading settings :: url = jar:file:/Users/sim/dev/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml |
|
com.databricks#spark-csv_2.10 added as a dependency |
|
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 |
|
confs: [default] |
|
found com.databricks#spark-csv_2.10;1.0.3 in central |
|
found org.apache.commons#commons-csv;1.1 in central |
|
:: resolution report :: resolve 195ms :: artifacts dl 5ms |
|
:: modules in use: |
|
com.databricks#spark-csv_2.10;1.0.3 from central in [default] |
|
org.apache.commons#commons-csv;1.1 from central in [default] |
|
--------------------------------------------------------------------- |
|
| | modules || artifacts | |
|
| conf | number| search|dwnlded|evicted|| number|dwnlded| |
|
--------------------------------------------------------------------- |
|
| default | 2 | 0 | 0 | 0 || 2 | 0 | |
|
--------------------------------------------------------------------- |
|
:: retrieving :: org.apache.spark#spark-submit-parent |
|
confs: [default] |
|
0 artifacts copied, 2 already retrieved (0kB/5ms) |
|
2015-07-02 15:29:33.242 java[45393:7905252] Unable to load realm info from SCDynamicStore |
|
15/07/02 15:29:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
|
15/07/02 15:29:33 INFO spark.SecurityManager: Changing view acls to: sim |
|
15/07/02 15:29:33 INFO spark.SecurityManager: Changing modify acls to: sim |
|
15/07/02 15:29:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sim); users with modify permissions: Set(sim) |
|
15/07/02 15:29:33 INFO spark.HttpServer: Starting HTTP Server |
|
15/07/02 15:29:33 INFO server.Server: jetty-8.y.z-SNAPSHOT |
|
15/07/02 15:29:33 INFO server.AbstractConnector: Started [email protected]:62083 |
|
15/07/02 15:29:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 62083. |
|
Welcome to |
|
____ __ |
|
/ __/__ ___ _____/ /__ |
|
_\ \/ _ \/ _ `/ __/ '_/ |
|
/___/ .__/\_,_/_/ /_/\_\ version 1.3.1 |
|
/_/ |
|
|
|
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51) |
|
Type in expressions to have them evaluated. |
|
Type :help for more information. |
|
15/07/02 15:29:36 INFO spark.SparkContext: Running Spark version 1.3.1 |
|
15/07/02 15:29:36 INFO spark.SecurityManager: Changing view acls to: sim |
|
15/07/02 15:29:36 INFO spark.SecurityManager: Changing modify acls to: sim |
|
15/07/02 15:29:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sim); users with modify permissions: Set(sim) |
|
15/07/02 15:29:36 INFO slf4j.Slf4jLogger: Slf4jLogger started |
|
15/07/02 15:29:36 INFO Remoting: Starting remoting |
|
15/07/02 15:29:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:62084] |
|
15/07/02 15:29:36 INFO util.Utils: Successfully started service 'sparkDriver' on port 62084. |
|
15/07/02 15:29:36 INFO spark.SparkEnv: Registering MapOutputTracker |
|
15/07/02 15:29:36 INFO spark.SparkEnv: Registering BlockManagerMaster |
|
15/07/02 15:29:36 INFO storage.DiskBlockManager: Created local directory at /var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-0de5dce8-23bf-4dab-849e-f3e55e083747/blockmgr-55d47ebf-9987-4f9b-ac3b-02537c0e86ba |
|
15/07/02 15:29:36 INFO storage.MemoryStore: MemoryStore started with capacity 2.1 GB |
|
15/07/02 15:29:36 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-b8b6bbb8-13cc-4c7e-9696-2f3be90b54c6/httpd-9730dc96-ceb4-410a-aba0-967216cec688 |
|
15/07/02 15:29:36 INFO spark.HttpServer: Starting HTTP Server |
|
15/07/02 15:29:36 INFO server.Server: jetty-8.y.z-SNAPSHOT |
|
15/07/02 15:29:36 INFO server.AbstractConnector: Started [email protected]:62085 |
|
15/07/02 15:29:36 INFO util.Utils: Successfully started service 'HTTP file server' on port 62085. |
|
15/07/02 15:29:36 INFO spark.SparkEnv: Registering OutputCommitCoordinator |
|
15/07/02 15:29:36 INFO server.Server: jetty-8.y.z-SNAPSHOT |
|
15/07/02 15:29:36 INFO server.AbstractConnector: Started [email protected]:4040 |
|
15/07/02 15:29:36 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. |
|
15/07/02 15:29:36 INFO ui.SparkUI: Started SparkUI at http://192.168.1.12:4040 |
|
15/07/02 15:29:36 INFO spark.SparkContext: Added JAR file:/Users/sim/.ivy2/jars/spark-csv_2.10.jar at http://192.168.1.12:62085/jars/spark-csv_2.10.jar with timestamp 1435865376837 |
|
15/07/02 15:29:36 INFO spark.SparkContext: Added JAR file:/Users/sim/.ivy2/jars/commons-csv.jar at http://192.168.1.12:62085/jars/commons-csv.jar with timestamp 1435865376838 |
|
15/07/02 15:29:36 INFO executor.Executor: Starting executor ID <driver> on host localhost |
|
15/07/02 15:29:36 INFO executor.Executor: Using REPL class URI: http://192.168.1.12:62083 |
|
15/07/02 15:29:36 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:62084/user/HeartbeatReceiver |
|
15/07/02 15:29:36 INFO netty.NettyBlockTransferService: Server created on 62086 |
|
15/07/02 15:29:36 INFO storage.BlockManagerMaster: Trying to register BlockManager |
|
15/07/02 15:29:36 INFO storage.BlockManagerMasterActor: Registering block manager localhost:62086 with 2.1 GB RAM, BlockManagerId(<driver>, localhost, 62086) |
|
15/07/02 15:29:36 INFO storage.BlockManagerMaster: Registered BlockManager |
|
15/07/02 15:29:37 INFO repl.SparkILoop: Created spark context.. |
|
Spark context available as sc. |
|
15/07/02 15:29:37 INFO repl.SparkILoop: Created sql context (with Hive support).. |
|
SQL context available as sqlContext. |
|
|
|
scala> import org.apache.spark.sql.hive.HiveContext |
|
import org.apache.spark.sql.hive.HiveContext |
|
|
|
scala> |
|
|
|
scala> val ctx = new HiveContext(sc) |
|
ctx: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@2e46890e |
|
|
|
scala> import ctx.implicits._ |
|
import ctx.implicits._ |
|
|
|
scala> |
|
|
|
scala> val df = ctx.jsonFile("file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz") |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(183601) called with curMem=0, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 179.3 KB, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(26218) called with curMem=183601, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.6 KB, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:62086 (size: 25.6 KB, free: 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0 |
|
15/07/02 15:29:52 INFO spark.SparkContext: Created broadcast 0 from textFile at JSONRelation.scala:114 |
|
15/07/02 15:29:52 INFO mapred.FileInputFormat: Total input paths to process : 1 |
|
15/07/02 15:29:52 INFO spark.SparkContext: Starting job: isEmpty at JsonRDD.scala:51 |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Got job 0 (isEmpty at JsonRDD.scala:51) with 1 output partitions (allowLocal=true) |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Final stage: Stage 0(isEmpty at JsonRDD.scala:51) |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Parents of final stage: List() |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Missing parents: List() |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Submitting Stage 0 (file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz MapPartitionsRDD[1] at textFile at JSONRelation.scala:114), which has no missing parents |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(2728) called with curMem=209819, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.7 KB, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(2031) called with curMem=212547, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2031.0 B, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:62086 (size: 2031.0 B, free: 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0 |
|
15/07/02 15:29:52 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839 |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz MapPartitionsRDD[1] at textFile at JSONRelation.scala:114) |
|
15/07/02 15:29:52 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks |
|
15/07/02 15:29:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1453 bytes) |
|
15/07/02 15:29:52 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) |
|
15/07/02 15:29:52 INFO executor.Executor: Fetching http://192.168.1.12:62085/jars/commons-csv.jar with timestamp 1435865376838 |
|
15/07/02 15:29:52 INFO util.Utils: Fetching http://192.168.1.12:62085/jars/commons-csv.jar to /var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-fd72e62c-1adf-4bad-8c3d-5b3899545675/userFiles-c8f3949e-7f5e-43c3-b1ef-8f22523bdbcc/fetchFileTemp2464132617259806671.tmp |
|
15/07/02 15:29:52 INFO executor.Executor: Adding file:/var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-fd72e62c-1adf-4bad-8c3d-5b3899545675/userFiles-c8f3949e-7f5e-43c3-b1ef-8f22523bdbcc/commons-csv.jar to class loader |
|
15/07/02 15:29:52 INFO executor.Executor: Fetching http://192.168.1.12:62085/jars/spark-csv_2.10.jar with timestamp 1435865376837 |
|
15/07/02 15:29:52 INFO util.Utils: Fetching http://192.168.1.12:62085/jars/spark-csv_2.10.jar to /var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-fd72e62c-1adf-4bad-8c3d-5b3899545675/userFiles-c8f3949e-7f5e-43c3-b1ef-8f22523bdbcc/fetchFileTemp3554212928556694314.tmp |
|
15/07/02 15:29:52 INFO executor.Executor: Adding file:/var/folders/ln/j4dkd3bd07d_7tzqc843y2jw0000gn/T/spark-fd72e62c-1adf-4bad-8c3d-5b3899545675/userFiles-c8f3949e-7f5e-43c3-b1ef-8f22523bdbcc/spark-csv_2.10.jar to class loader |
|
15/07/02 15:29:52 INFO rdd.HadoopRDD: Input split: file:/Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz:0+22597095 |
|
15/07/02 15:29:52 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id |
|
15/07/02 15:29:52 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id |
|
15/07/02 15:29:52 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap |
|
15/07/02 15:29:52 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition |
|
15/07/02 15:29:52 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id |
|
15/07/02 15:29:52 INFO compress.CodecPool: Got brand-new decompressor [.gz] |
|
15/07/02 15:29:52 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 3741 bytes result sent to driver |
|
15/07/02 15:29:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 129 ms on localhost (1/1) |
|
15/07/02 15:29:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Stage 0 (isEmpty at JsonRDD.scala:51) finished in 0.139 s |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Job 0 finished: isEmpty at JsonRDD.scala:51, took 0.172693 s |
|
15/07/02 15:29:52 INFO spark.SparkContext: Starting job: reduce at JsonRDD.scala:54 |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Got job 1 (reduce at JsonRDD.scala:54) with 1 output partitions (allowLocal=false) |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Final stage: Stage 1(reduce at JsonRDD.scala:54) |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Parents of final stage: List() |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Missing parents: List() |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[3] at map at JsonRDD.scala:54), which has no missing parents |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(3240) called with curMem=214578, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.MemoryStore: ensureFreeSpace(2338) called with curMem=217818, maxMem=2223023063 |
|
15/07/02 15:29:52 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.3 KB, free 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:62086 (size: 2.3 KB, free: 2.1 GB) |
|
15/07/02 15:29:52 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0 |
|
15/07/02 15:29:52 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839 |
|
15/07/02 15:29:52 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 1 (MapPartitionsRDD[3] at map at JsonRDD.scala:54) |
|
15/07/02 15:29:52 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks |
|
15/07/02 15:29:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1453 bytes) |
|
15/07/02 15:29:52 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1) |
|
15/07/02 15:29:52 INFO rdd.HadoopRDD: Input split: file:/Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz:0+22597095 |
|
15/07/02 15:29:52 INFO compress.CodecPool: Got brand-new decompressor [.gz] |
|
15/07/02 15:29:54 INFO storage.BlockManager: Removing broadcast 1 |
|
15/07/02 15:29:54 INFO storage.BlockManager: Removing block broadcast_1_piece0 |
|
15/07/02 15:29:54 INFO storage.MemoryStore: Block broadcast_1_piece0 of size 2031 dropped from memory (free 2222804938) |
|
15/07/02 15:29:54 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on localhost:62086 in memory (size: 2031.0 B, free: 2.1 GB) |
|
15/07/02 15:29:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0 |
|
15/07/02 15:29:54 INFO storage.BlockManager: Removing block broadcast_1 |
|
15/07/02 15:29:54 INFO storage.MemoryStore: Block broadcast_1 of size 2728 dropped from memory (free 2222807666) |
|
15/07/02 15:29:54 INFO spark.ContextCleaner: Cleaned broadcast 1 |
|
15/07/02 15:30:06 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 6638 bytes result sent to driver |
|
15/07/02 15:30:06 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 13740 ms on localhost (1/1) |
|
15/07/02 15:30:06 INFO scheduler.DAGScheduler: Stage 1 (reduce at JsonRDD.scala:54) finished in 13.744 s |
|
15/07/02 15:30:06 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool |
|
15/07/02 15:30:06 INFO scheduler.DAGScheduler: Job 1 finished: reduce at JsonRDD.scala:54, took 13.753319 s |
|
15/07/02 15:30:06 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore |
|
15/07/02 15:30:06 INFO metastore.ObjectStore: ObjectStore, initialize called |
|
15/07/02 15:30:06 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored |
|
15/07/02 15:30:06 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored |
|
15/07/02 15:30:06 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) |
|
15/07/02 15:30:07 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) |
|
15/07/02 15:30:07 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" |
|
15/07/02 15:30:07 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". |
|
15/07/02 15:30:08 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. |
|
15/07/02 15:30:08 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. |
|
15/07/02 15:30:08 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. |
|
15/07/02 15:30:08 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. |
|
15/07/02 15:30:08 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing |
|
15/07/02 15:30:08 INFO metastore.ObjectStore: Initialized ObjectStore |
|
15/07/02 15:30:08 INFO metastore.HiveMetaStore: Added admin role in metastore |
|
15/07/02 15:30:08 INFO metastore.HiveMetaStore: Added public role in metastore |
|
15/07/02 15:30:08 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty |
|
15/07/02 15:30:08 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. |
|
15/07/02 15:30:08 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. |
|
df: org.apache.spark.sql.DataFrame = [aac_brand: string, aag__id: bigint, aag_weight: bigint, aca_brand: string, aca_conversion_integration: boolean, aca_daily_budget: bigint, aca_hide_brand_from_publishers: boolean, aca_is_remnant: boolean, aca_short_name: string, accid: string, acr__id: bigint, acr_choices: array<struct<cta:string,headline:string,img:string,target:string>>, acr_cta: string, acr_description1: string, acr_description2: string, acr_destination: string, acr_displayUrl: string, acr_headline: string, acr_img: string, acr_isiUrl: string, acr_paramCTA: string, acr_paramName: string, acr_paramPlaceholder: string, acr_target: string, acr_type: string, acr_weight: bigint, agid: string, akw__id: bigint, akw_canonical_id: bigint, akw_criterion_type: string, akw_destination_url: st... |
|
scala> df.registerTempTable("training") |
|
|
|
scala> |
|
|
|
scala> val dfCount = ctx.sql("select count(*) as cnt from training") |
|
15/07/02 15:30:09 INFO parse.ParseDriver: Parsing command: select count(*) as cnt from training |
|
15/07/02 15:30:09 INFO parse.ParseDriver: Parse Completed |
|
dfCount: org.apache.spark.sql.DataFrame = [cnt: bigint] |
|
|
|
scala> println(dfCount.first.getLong(0)) |
|
15/07/02 15:30:09 INFO storage.MemoryStore: ensureFreeSpace(90479) called with curMem=215397, maxMem=2223023063 |
|
15/07/02 15:30:09 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 88.4 KB, free 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.MemoryStore: ensureFreeSpace(36868) called with curMem=305876, maxMem=2223023063 |
|
15/07/02 15:30:09 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 36.0 KB, free 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:62086 (size: 36.0 KB, free: 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0 |
|
15/07/02 15:30:09 INFO spark.SparkContext: Created broadcast 3 from textFile at JSONRelation.scala:114 |
|
15/07/02 15:30:09 INFO spark.SparkContext: Starting job: runJob at SparkPlan.scala:122 |
|
15/07/02 15:30:09 INFO mapred.FileInputFormat: Total input paths to process : 1 |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Registering RDD 10 (mapPartitions at Exchange.scala:101) |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Got job 2 (runJob at SparkPlan.scala:122) with 1 output partitions (allowLocal=false) |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Final stage: Stage 3(runJob at SparkPlan.scala:122) |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 2) |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Missing parents: List(Stage 2) |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[10] at mapPartitions at Exchange.scala:101), which has no missing parents |
|
15/07/02 15:30:09 INFO storage.MemoryStore: ensureFreeSpace(17448) called with curMem=342744, maxMem=2223023063 |
|
15/07/02 15:30:09 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 17.0 KB, free 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.MemoryStore: ensureFreeSpace(9310) called with curMem=360192, maxMem=2223023063 |
|
15/07/02 15:30:09 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 9.1 KB, free 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:62086 (size: 9.1 KB, free: 2.1 GB) |
|
15/07/02 15:30:09 INFO storage.BlockManagerMaster: Updated info of block broadcast_4_piece0 |
|
15/07/02 15:30:09 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839 |
|
15/07/02 15:30:09 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 2 (MapPartitionsRDD[10] at mapPartitions at Exchange.scala:101) |
|
15/07/02 15:30:09 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks |
|
15/07/02 15:30:09 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1442 bytes) |
|
15/07/02 15:30:09 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2) |
|
15/07/02 15:30:09 INFO rdd.HadoopRDD: Input split: file:/Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz:0+22597095 |
|
15/07/02 15:30:09 INFO compress.CodecPool: Got brand-new decompressor [.gz] |
|
15/07/02 15:30:15 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 2003 bytes result sent to driver |
|
15/07/02 15:30:15 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 5081 ms on localhost (1/1) |
|
15/07/02 15:30:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Stage 2 (mapPartitions at Exchange.scala:101) finished in 5.081 s |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: looking for newly runnable stages |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: running: Set() |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: waiting: Set(Stage 3) |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: failed: Set() |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Missing parents for Stage 3: List() |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Submitting Stage 3 (MapPartitionsRDD[14] at map at SparkPlan.scala:97), which is now runnable |
|
15/07/02 15:30:15 INFO storage.MemoryStore: ensureFreeSpace(18920) called with curMem=369502, maxMem=2223023063 |
|
15/07/02 15:30:15 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 18.5 KB, free 2.1 GB) |
|
15/07/02 15:30:15 INFO storage.MemoryStore: ensureFreeSpace(10501) called with curMem=388422, maxMem=2223023063 |
|
15/07/02 15:30:15 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 10.3 KB, free 2.1 GB) |
|
15/07/02 15:30:15 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:62086 (size: 10.3 KB, free: 2.1 GB) |
|
15/07/02 15:30:15 INFO storage.BlockManagerMaster: Updated info of block broadcast_5_piece0 |
|
15/07/02 15:30:15 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:839 |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 3 (MapPartitionsRDD[14] at map at SparkPlan.scala:97) |
|
15/07/02 15:30:15 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks |
|
15/07/02 15:30:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, PROCESS_LOCAL, 1171 bytes) |
|
15/07/02 15:30:15 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3) |
|
15/07/02 15:30:15 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks |
|
15/07/02 15:30:15 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms |
|
15/07/02 15:30:15 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1115 bytes result sent to driver |
|
15/07/02 15:30:15 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 52 ms on localhost (1/1) |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Stage 3 (runJob at SparkPlan.scala:122) finished in 0.052 s |
|
15/07/02 15:30:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool |
|
15/07/02 15:30:15 INFO scheduler.DAGScheduler: Job 2 finished: runJob at SparkPlan.scala:122, took 5.168569 s |
|
88283 |
|
|
|
scala> |