Last active
August 29, 2015 14:14
-
-
Save emjayess/c7c96c19678f945a6a31 to your computer and use it in GitHub Desktop.
Apache Spark & 'mcmath' NormTermOrder 10k
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
macarooni:geekout emjayess$ pyspark | |
Python 2.7.5 (default, Mar 9 2014, 22:15:05) | |
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin | |
Type "help", "copyright", "credits" or "license" for more information. | |
Spark assembly has been built with Hive, including Datanucleus jars on classpath | |
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties | |
15/02/03 09:18:31 INFO SecurityManager: Changing view acls to: emjayess | |
15/02/03 09:18:31 INFO SecurityManager: Changing modify acls to: emjayess | |
15/02/03 09:18:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(emjayess); users with modify permissions: Set(emjayess) | |
15/02/03 09:18:32 INFO Slf4jLogger: Slf4jLogger started | |
15/02/03 09:18:32 INFO Remoting: Starting remoting | |
15/02/03 09:18:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:63274] | |
15/02/03 09:18:32 INFO Utils: Successfully started service 'sparkDriver' on port 63274. | |
15/02/03 09:18:32 INFO SparkEnv: Registering MapOutputTracker | |
15/02/03 09:18:32 INFO SparkEnv: Registering BlockManagerMaster | |
15/02/03 09:18:32 INFO DiskBlockManager: Created local directory at /var/folders/dn/4lp40f8d55d0l_glfztzq6wc0000gn/T/spark-local-20150203091832-1849 | |
15/02/03 09:18:32 INFO MemoryStore: MemoryStore started with capacity 273.0 MB | |
2015-02-03 09:18:32.896 java[23456:c003] Unable to load realm info from SCDynamicStore | |
15/02/03 09:18:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
15/02/03 09:18:33 INFO HttpFileServer: HTTP File server directory is /var/folders/dn/4lp40f8d55d0l_glfztzq6wc0000gn/T/spark-6482de0d-6678-4cb2-bf33-b0752edf56f3 | |
15/02/03 09:18:33 INFO HttpServer: Starting HTTP Server | |
15/02/03 09:18:33 INFO Utils: Successfully started service 'HTTP file server' on port 63275. | |
15/02/03 09:18:33 INFO Utils: Successfully started service 'SparkUI' on port 4040. | |
15/02/03 09:18:33 INFO SparkUI: Started SparkUI at http://192.168.1.103:4040 | |
15/02/03 09:18:33 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:63274/user/HeartbeatReceiver | |
15/02/03 09:18:33 INFO NettyBlockTransferService: Server created on 63279 | |
15/02/03 09:18:33 INFO BlockManagerMaster: Trying to register BlockManager | |
15/02/03 09:18:33 INFO BlockManagerMasterActor: Registering block manager localhost:63279 with 273.0 MB RAM, BlockManagerId(<driver>, localhost, 63279) | |
15/02/03 09:18:33 INFO BlockManagerMaster: Registered BlockManager | |
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/__ / .__/\_,_/_/ /_/\_\ version 1.2.0 | |
/_/ | |
Using Python version 2.7.5 (default, Mar 9 2014 22:15:05) | |
SparkContext available as sc. | |
>>> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> nto10k = sc.textFile("NormTermOrder10000.csv") | |
15/02/03 09:19:11 INFO MemoryStore: ensureFreeSpace(172851) called with curMem=0, maxMem=286300569 | |
15/02/03 09:19:11 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 168.8 KB, free 272.9 MB) | |
15/02/03 09:19:11 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=172851, maxMem=286300569 | |
15/02/03 09:19:11 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 272.9 MB) | |
15/02/03 09:19:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:63279 (size: 22.2 KB, free: 273.0 MB) | |
15/02/03 09:19:11 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 | |
15/02/03 09:19:11 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> nto10k.count() | |
15/02/03 09:19:30 INFO FileInputFormat: Total input paths to process : 1 | |
15/02/03 09:19:30 INFO SparkContext: Starting job: count at <stdin>:1 | |
15/02/03 09:19:30 INFO DAGScheduler: Got job 0 (count at <stdin>:1) with 2 output partitions (allowLocal=false) | |
15/02/03 09:19:30 INFO DAGScheduler: Final stage: Stage 0(count at <stdin>:1) | |
15/02/03 09:19:30 INFO DAGScheduler: Parents of final stage: List() | |
15/02/03 09:19:30 INFO DAGScheduler: Missing parents: List() | |
15/02/03 09:19:30 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count at <stdin>:1), which has no missing parents | |
15/02/03 09:19:30 INFO MemoryStore: ensureFreeSpace(5488) called with curMem=195543, maxMem=286300569 | |
15/02/03 09:19:30 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.4 KB, free 272.8 MB) | |
15/02/03 09:19:30 INFO MemoryStore: ensureFreeSpace(4090) called with curMem=201031, maxMem=286300569 | |
15/02/03 09:19:30 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.0 KB, free 272.8 MB) | |
15/02/03 09:19:30 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:63279 (size: 4.0 KB, free: 273.0 MB) | |
15/02/03 09:19:30 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 | |
15/02/03 09:19:30 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 | |
15/02/03 09:19:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[2] at count at <stdin>:1) | |
15/02/03 09:19:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks | |
15/02/03 09:19:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1346 bytes) | |
15/02/03 09:19:30 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1346 bytes) | |
15/02/03 09:19:30 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) | |
15/02/03 09:19:30 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) | |
15/02/03 09:19:31 INFO HadoopRDD: Input split: file:/Users/emjayess/Sources/geek_meet_code/mc.math/geekout/NormTermOrder10000.csv:0+34791 | |
15/02/03 09:19:31 INFO HadoopRDD: Input split: file:/Users/emjayess/Sources/geek_meet_code/mc.math/geekout/NormTermOrder10000.csv:34791+34791 | |
15/02/03 09:19:31 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
15/02/03 09:19:31 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
15/02/03 09:19:31 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
15/02/03 09:19:31 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition | |
15/02/03 09:19:31 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id | |
15/02/03 09:19:31 INFO PythonRDD: Times: total = 1116, boot = 939, init = 142, finish = 35 | |
15/02/03 09:19:31 INFO PythonRDD: Times: total = 1120, boot = 943, init = 137, finish = 40 | |
15/02/03 09:19:31 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1798 bytes result sent to driver | |
15/02/03 09:19:31 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1798 bytes result sent to driver | |
15/02/03 09:19:31 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1200 ms on localhost (1/2) | |
15/02/03 09:19:31 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1251 ms on localhost (2/2) | |
15/02/03 09:19:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
15/02/03 09:19:31 INFO DAGScheduler: Stage 0 (count at <stdin>:1) finished in 1.269 s | |
15/02/03 09:19:31 INFO DAGScheduler: Job 0 finished: count at <stdin>:1, took 1.349521 s | |
10200 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> nto10k.filter(lambda line: "3" in line).count() | |
15/02/03 09:20:57 INFO SparkContext: Starting job: count at <stdin>:1 | |
15/02/03 09:20:57 INFO DAGScheduler: Got job 1 (count at <stdin>:1) with 2 output partitions (allowLocal=false) | |
15/02/03 09:20:57 INFO DAGScheduler: Final stage: Stage 1(count at <stdin>:1) | |
15/02/03 09:20:57 INFO DAGScheduler: Parents of final stage: List() | |
15/02/03 09:20:57 INFO DAGScheduler: Missing parents: List() | |
15/02/03 09:20:57 INFO DAGScheduler: Submitting Stage 1 (PythonRDD[3] at count at <stdin>:1), which has no missing parents | |
15/02/03 09:20:57 INFO MemoryStore: ensureFreeSpace(5880) called with curMem=205121, maxMem=286300569 | |
15/02/03 09:20:57 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.7 KB, free 272.8 MB) | |
15/02/03 09:20:57 INFO MemoryStore: ensureFreeSpace(4358) called with curMem=211001, maxMem=286300569 | |
15/02/03 09:20:57 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.3 KB, free 272.8 MB) | |
15/02/03 09:20:57 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:63279 (size: 4.3 KB, free: 273.0 MB) | |
15/02/03 09:20:57 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 | |
15/02/03 09:20:57 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838 | |
15/02/03 09:20:57 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (PythonRDD[3] at count at <stdin>:1) | |
15/02/03 09:20:57 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks | |
15/02/03 09:20:57 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1346 bytes) | |
15/02/03 09:20:57 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1346 bytes) | |
15/02/03 09:20:57 INFO Executor: Running task 0.0 in stage 1.0 (TID 2) | |
15/02/03 09:20:57 INFO Executor: Running task 1.0 in stage 1.0 (TID 3) | |
15/02/03 09:20:57 INFO HadoopRDD: Input split: file:/Users/emjayess/Sources/geek_meet_code/mc.math/geekout/NormTermOrder10000.csv:0+34791 | |
15/02/03 09:20:57 INFO HadoopRDD: Input split: file:/Users/emjayess/Sources/geek_meet_code/mc.math/geekout/NormTermOrder10000.csv:34791+34791 | |
15/02/03 09:20:57 INFO PythonRDD: Times: total = 106, boot = 5, init = 48, finish = 53 | |
15/02/03 09:20:57 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1798 bytes result sent to driver | |
15/02/03 09:20:57 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 118 ms on localhost (1/2) | |
15/02/03 09:20:57 INFO PythonRDD: Times: total = 112, boot = 3, init = 55, finish = 54 | |
15/02/03 09:20:57 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1798 bytes result sent to driver | |
15/02/03 09:20:57 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 127 ms on localhost (2/2) | |
15/02/03 09:20:57 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool | |
15/02/03 09:20:57 INFO DAGScheduler: Stage 1 (count at <stdin>:1) finished in 0.133 s | |
15/02/03 09:20:57 INFO DAGScheduler: Job 1 finished: count at <stdin>:1, took 0.145861 s | |
3477 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
✨ Quick Start w/Spark
... including python examples.