Created
April 7, 2016 23:08
-
-
Save koduki/bed68e95fc0e48c907baf305d2b7258a to your computer and use it in GitHub Desktop.
SparkでRDDをテキストファイルとしてファイル出力するサンプル。copyMergeがポイント。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
String tmpDir = "target/output_tmp"; | |
String outputFile = "target/output.txt"; | |
// clear | |
FileUtil.fullyDelete(new File(tmpDir)); | |
FileUtil.fullyDelete(new File(outputFile)); | |
// init spark config | |
SparkConf sparkConf = new SparkConf().setAppName("test").setMaster("local"); | |
JavaSparkContext sc = new JavaSparkContext(sparkConf); | |
// create RDD | |
JavaRDD rdd = sc.parallelize(Arrays.asList("a", "b", "c")).repartition(3); | |
// save as Hadoop file format | |
rdd.saveAsTextFile(tmpDir); | |
// merge as Plain file | |
FileSystem hdfs = FileSystem.get(sc.hadoopConfiguration()); | |
FileUtil.copyMerge(hdfs, new Path(tmpDir), hdfs, new Path(outputFile), false, sc.hadoopConfiguration(), null); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment