by Rahul


  • Big Data
  • java


  • bigdata
  • java
  • spark

Spark processing multiline csv EOLs in text column

The multi line support for CSV will be added in spark version 2.2 JIRA and for now you can try below steps if you are facing issue while processing CSV:

Get InputFormat and reader classes from git to your code base and implement use it:
` import org.apache.hadoop.conf.Configuration; import; import; import; import; import;`



JavaPairRDD<longwritable, text=""> rdd = context. newAPIHadoopFile(<CSV file path>, FileCleaningInputFormat.class, null, null, new Configuration()); JavaRDD inputWithMultiline= -> s._2().toString()) </longwritable,>

Another solution for this problem is Apache Crunch CSV reader. This reader can be used like above FileCleaningInputFormat implementation.