Categories

  • Big Data
  • java

Tags

  • java
  • spark

Below code snippet shows how to save RDD output input single file with header:

SparkConf conf = new SparkConf().setAppName("test").setMaster("local[2]"); JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD headerRDD = jsc.parallelize(Arrays.asList(new String[]{"name,address,city"}), 1); JavaRDD dataRDD=....; //Make sure s.toString and header are in sync dataRDD= dataRDD.map(s-\>s.toString()); //Joined RDD JavaRDD joinedRDD= headerRDD.union(dataRDD); joinedRDD.repartition(1).saveAsTextFile("<output>");

</output>