apache-spark

Text files and operations in Scala

Introduction#

Reading Text files and performing operations on them.

Example usage

Read text file from path:

val sc: org.apache.spark.SparkContext = ???
sc.textFile(path="/path/to/input/file") 

Read files using wildcards:

sc.textFile(path="/path/to/*/*") 

Read files specifying minimum number of partitions:

sc.textFile(path="/path/to/input/file", minPartitions=3)

Join two files read with textFile()

Joins in Spark:

  • Read textFile 1

    val txt1=sc.textFile(path="/path/to/input/file1") 

    Eg:

      A B
      1 2
      3 4
  • Read textFile 2

    val txt2=sc.textFile(path="/path/to/input/file2") 

    Eg:

      A C
      1 5
      3 6
  • Join and print the result.

    txt1.join(txt2).foreach(println)

    Eg:

      A B C
      1 2 5
      3 4 6

The join above is based on the first column.


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow