Skip to content

Instantly share code, notes, and snippets.

@qxj
Last active May 20, 2017 17:23
Show Gist options
  • Save qxj/b3eb6ac5df3ddd710bb1c10cc7e3b1d6 to your computer and use it in GitHub Desktop.
Save qxj/b3eb6ac5df3ddd710bb1c10cc7e3b1d6 to your computer and use it in GitHub Desktop.
使用maven构建spark程序,借助IDEA。

使用maven构建spark程序,借助IDEA。

  1. 建立一个Maven工程,项目模版可以选择 maven-archetype-quickstart
  2. 在项目处右键 Add Framework Support > scala,然后编辑 pom.xml 添加相应依赖;
  3. 在main目录下添加一个scala目录,用于放置scala代码,如下SubmitExample.scala
  4. IDEA内右键Run... 可以测试整个工程是否运行正常;
  5. 真实提交任务到YARN集群,需要去掉.setMaster("local")代码,并用spark-submit提交任务。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>net.jqian.learning.spark</groupId>
<artifactId>introduction</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<!-- compile & package scala program -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<!-- package all upstream dependencies -->
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<!-- this is used for inheritance merges -->
<phase>package</phase>
<!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
#!/bin/bash
spark-submit \
--class net.jqian.learning.spark.SubmitExample \
--num-executors 5 \
--driver-memory 4g \
--executor-memory 8g \
--executor-cores 4 \
--master yarn-cluster \
--conf spark.yarn.executor.memoryOverhead=1024 \
example-0.1-jar-with-dependencies.jar
package net.jqian.learning.spark
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
object SubmitExample {
def main (args: Array[String]) {
val conf = new SparkConf()
.setAppName("submitExample")
.setMaster("local") // 用于本地IDEA测试
val sc = new SparkContext(conf)
val br = sc.broadcast(Map(1 -> "h", 2 -> "e", 3 -> "l", 4 -> "l", 5 -> "o"))
val ac = sc.accumulator(0)
val arr = sc.parallelize(1 to 8 toList).map(
k => {
br.value.get(k) match {
case Some(v) => v
case None => {
ac add 1
""
}
}
}
)
arr.cache()
val res = arr.reduce(_ + _)
println(res)
println("miss found: " + ac.value)
sc.stop()
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment