• Big Data
  • java


  • java
  • spark

I am gonna demonstrate step by step setup of spark project in this post and explore few basics Spark dataset operations in Java.

Create Maven project with POM:

<?xml version="1.0" encoding="UTF-8"?><project xmlns="" xmlns:xsi="" xsi:schemalocation="">



Project structure:

Create Bean definition:

public class People implements Serializable { private String name; private Long age; public People() { } public People(String name, Long age) { = name; this.age = age; } public String getName() { return name; } public void setName(String name) { = name; } public Long getAge() { return age; } public void setAge(Long age) { this.age = age; } }

Create people.json file in resource directory:

Filter content of dataset:

public class Application { public static void main(String[] args) { SparkSession session=SparkSession.builder().appName("dataset example").getOrCreate(); /\*\* \* Define encoder, used to convert data to binary format in jvm \*/ Encoder encode= Encoders.bean(People.class); /\*\* \* Load dataset from json \*/ Dataset ds= getContextClassLoader().getResource("people.json"). getPath()).as(encode); ds.filter((FilterFunction<people>)s-&gt; (s.getAge()&gt;30)).show();