Text files and operations in Scala
Introduction#
Reading Text files and performing operations on them.
Example usage
Read text file from path:
val sc: org.apache.spark.SparkContext = ???
sc.textFile(path="/path/to/input/file")
Read files using wildcards:
sc.textFile(path="/path/to/*/*")
Read files specifying minimum number of partitions:
sc.textFile(path="/path/to/input/file", minPartitions=3)
Join two files read with textFile()
Joins in Spark:
-
Read textFile 1
val txt1=sc.textFile(path="/path/to/input/file1")
Eg:
A B 1 2 3 4
-
Read textFile 2
val txt2=sc.textFile(path="/path/to/input/file2")
Eg:
A C 1 5 3 6
-
Join and print the result.
txt1.join(txt2).foreach(println)
Eg:
A B C 1 2 5 3 4 6
The join above is based on the first column.