Parallel Collections
Remarks#
Parallel collections facilitate parallel programming by hiding low-level parallelization details. This makes taking advantage of multi-core architectures easy. Examples of parallel collections include ParArray
, ParVector
, mutable.ParHashMap
, immutable.ParHashMap
, and ParRange
. A full list can be found in the documentation.
Creating and Using Parallel Collections
To create a parallel collection from a sequential collection, call the par
method. To create a sequential collection from a parallel collection, call the seq
method. This example shows how you turn a regular Vector
into a ParVector
, and then back again:
scala> val vect = (1 to 5).toVector
vect: Vector[Int] = Vector(1, 2, 3, 4, 5)
scala> val parVect = vect.par
parVect: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1, 2, 3, 4, 5)
scala> parVect.seq
res0: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)
The par
method can be chained, allowing you to convert a sequential collection to a parallel collection and immediately perform an action on it:
scala> vect.map(_ * 2)
res1: scala.collection.immutable.Vector[Int] = Vector(2, 4, 6, 8, 10)
scala> vect.par.map(_ * 2)
res2: scala.collection.parallel.immutable.ParVector[Int] = ParVector(2, 4, 6, 8, 10)
In these examples, the work is actually parceled out to multiple processing units, and then re-joined after the work is complete - without requiring developer intervention.
Pitfalls
Do not use parallel collections when the collection elements must be received in a specific order.
Parallel collections perform operations concurrently. That means that all of the work is divided into parts and distributed to different processors. Each processor is unaware of the work being done by others. If the order of the collection matters then work processed in parallel is nondeterministic. (Running the same code twice can yield different results.)
Non-associative Operations
If an operation is non-associative (if the order of execution matters), then the result on a parallelized collection will be nondeterministic.
scala> val list = (1 to 1000).toList
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
scala> list.reduce(_ - _)
res0: Int = -500498
scala> list.reduce(_ - _)
res1: Int = -500498
scala> list.reduce(_ - _)
res2: Int = -500498
scala> val listPar = list.par
listPar: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
scala> listPar.reduce(_ - _)
res3: Int = -408314
scala> listPar.reduce(_ - _)
res4: Int = -422884
scala> listPar.reduce(_ - _)
res5: Int = -301748
Side Effects
Operations that have side effects, such as foreach
, may not execute as desired on parallelized collections due to race conditions. Avoid this by using functions that have no side effects, such as reduce
or map
.
scala> val wittyOneLiner = Array("Artificial", "Intelligence", "is", "no", "match", "for", "natural", "stupidity")
scala> wittyOneLiner.foreach(word => print(word + " "))
Artificial Intelligence is no match for natural stupidity
scala> wittyOneLiner.par.foreach(word => print(word + " "))
match natural is for Artificial no stupidity Intelligence
scala> print(wittyOneLiner.par.reduce{_ + " " + _})
Artificial Intelligence is no match for natural stupidity
scala> val list = (1 to 100).toList
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...