Skip to content

Instantly share code, notes, and snippets.

@koduki
Created April 7, 2016 23:08
Show Gist options
  • Save koduki/bed68e95fc0e48c907baf305d2b7258a to your computer and use it in GitHub Desktop.
Save koduki/bed68e95fc0e48c907baf305d2b7258a to your computer and use it in GitHub Desktop.
SparkでRDDをテキストファイルとしてファイル出力するサンプル。copyMergeがポイント。
String tmpDir = "target/output_tmp";
String outputFile = "target/output.txt";
// clear
FileUtil.fullyDelete(new File(tmpDir));
FileUtil.fullyDelete(new File(outputFile));
// init spark config
SparkConf sparkConf = new SparkConf().setAppName("test").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
// create RDD
JavaRDD rdd = sc.parallelize(Arrays.asList("a", "b", "c")).repartition(3);
// save as Hadoop file format
rdd.saveAsTextFile(tmpDir);
// merge as Plain file
FileSystem hdfs = FileSystem.get(sc.hadoopConfiguration());
FileUtil.copyMerge(hdfs, new Path(tmpDir), hdfs, new Path(outputFile), false, sc.hadoopConfiguration(), null);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment