Scala Foundation Course - For Comprehension


Welcome back. We were discussing Scala control structures. The last item among the control abstraction is the for expression. The Scala's for loop is a swiss army knife of iterations. Here is the general structure of the for expression.


    for ( seq ) yield { expr }  
                        

It looks simple, but I realized that it is quite confusing to explain the above structure. Let me take a progressive approach to explain that. To simplify the structure, let's ignore the yield for now. The yield is optional. You will be using it most of the time, but let's keep it aside for the moment.
Now the structure looks like this.

                                        
    for ( seq ) { expr }                                          
                            

Let's try to understand this structure.If you learned other languages, the above structure should look familiar. The things inside the parenthesis will control the number of iterations, and those curly braces represent the body of the loop. That's how it is in most of the languages.

                                            
    for ( i <- 1 to 10 ) {
        statement-1;
        statement-2;
    }                                                       
                                

You can have one or more expressions within the body. If you have a single expression, curly braces are optional. We don't have any complexity for the body of the loop. It is almost same as any other language.
But the seq is somewhat complicated. It keeps confusing a lot of people.

The sequence generator in Scala for loop

Let’s start with the simplest form of the seq and then expand it step by step. The simplest form of seq is a generator that looks something like below.

                                                
    e <- col                                                 
                            

The col is a Scala collection, and e is a value that binds to each element of the collection. Let's take an example.

                                                    
    val n = 1 to 5
    // n is a Range collection with ten elements.
    //Now I can iterate through this collection using a for Loop.
    for ( i<- n ) println(i)
    // You can remove the middleman and get a collection on the fly. 
    for ( i<- 1 to 5 ) println(i)                                                     
                                

So far so good. But one thing to notice here is the presence of a collection. You will need a collection to iterate using a Scala for loop. A beginner takes a lot of time to realize the fact that you need a collection to use Scala for expression. If you don't have a collection, you can't use a Scala for loop. We wrote 1 to 5 in the above example. That gives you a feeling that you are iterating five times, but in fact, you are iterating over a range collection. The range collection is implicit. The body of the for loop in this example is quite simple. Just a println. Let's add some more code to it.

                                                        
    for ( country <- List("India","USA","China","Japan") ) {
          country match {
                case "India"  =>println("Delhi")
                case "USA"    =>println("Washington D.C.")
                case "Japan"  =>println("Tokyo")
                case _        =>println("I don't know") 
        }
    }                                                                       
                                    

The above code is an example of the most basic form of the for loop. Iterate over a collection and do something with each element of the collection. Now, It is time for a quiz and here is my question. Given the below list, can you generate the same output as above without using a for loop?

                                                    
    val countries = List("India","USA","China","Japan")                                                
                                

All you need to do is to print the capital of the country in the list. You already know it. I explained that in an earlier video.

                                                        
    countries.foreach{ country =>
        country match {
            case "India"   =>println("Delhi")
            case "USA"     =>println("Washington D.C.")
            case "Japan"   =>println("Tokyo")
            case _         =>println("I don't know") 
        }
    }                                               
                                    

Does the above code look familiar? We learned it already in an earlier video. The foreach method is a higher order control abstraction. The sole purpose of the foreach method is to iterate over each element of the collection and apply the given code. We are doing the same thing as we did in the for loop. Both the constructs are the same. Now I have another question. Why do we have two constructs for the same thing?

Redefine the Scala for loop

The Scala for loop is just a syntactic sugar for Higher Order Control Abstractions. Internally, both are same. What does it mean?
That means the Scala compiler will convert the for loop to a combination of following control abstractions.

  1. foreach
  2. map
  3. flatMap
  4. withFilter

In other words, Scala doesn't have a for loop. It's just a syntactic sugar for a set of these methods. So, if you don't like the for loop, you can manage to code in Scala without even worrying about the for loops. The real purpose of the Scala for expression is to write the code in a way that makes more sense. You should use the for expression when you think your code is getting too cryptic using these methods and it would make more sense if you implement it using a for expression. Once you know that the for expression is just a syntactic sugar, it becomes quite easy to understand the for expression.

The Yield in Scala for expression

Now let's bring the yield back into the structure.

                                                        
    for ( seq ) yield { expr }                                               
                                    

In the absence of yield, the for loop behaves like a foreach method. When we apply the yield, it behaves like a map method. I hope you still remember the difference between foreach and the map method. The foreach returns a Unit whereas the map method returns a new collection. So, can you update the above example to behave like a map method? I mean, instead of printing the capitals of the countries, return a new collection.
To achieve that, we need to make two changes.

  1. Place the yield just before the body of the loop.
  2. Remove those println and return the string.

    val countries = List("India","USA","China","Japan")
    for ( country <-  countries ) yield {
          country match {
            case "India"   => "Delhi"
            case "USA"     => "Washington D.C."
            case "Japan"   => "Tokyo"
            case _         => "I don't know"
        }
    }                       

Simple. Isn't it. The for loop becomes a for expression. Because, now it returns a value. You can also assign the result of the above for expression to a value.

                                    
    val capitals = for ( country <-  countries ) yield {
        country match {
            case "India"   => "Delhi"
            case "USA"     => "Washington D.C."
            case "Japan"   => "Tokyo"
            case _         => "I don't know"
        }
    }                                                               
                                            

Nested for loop

Now we need to get into the further expansion of the seq.

                                                                    
    for ( seq ) yield { expr }                                                           
                                                

I said that the simplest form of seq is a generator that looks something like below.

                                    
    e <- col                                                         
                                                    

Scala allows you to have multiple generators. But the next generator will cause a nested loop. Let's see a simple example.

                                        
    for (i <- 1 to 3; j <- 1 to 2) println (s"i=$i  j=$j")
    /* Output
    i=1  j=1
    i=1  j=2
    i=2  j=1
    i=2  j=2
    i=3  j=1
    i=3  j=2
    */                                              
                                                        

So, for each value of i, you are getting two values of j. That's what the nested loop is. You can extend the nesting further.

                                            
    for (i <- 1 to 3; j <- 1 to 2; k <- 1 to 2) println (s"i=$i  j=$j k=$k")                                              
                                                            

If the code between the pair of parenthesis is getting complex, you can use curly braces.

                                                
    for {   i <- 1 to 3; 
            j <- 1 to 2; 
            k <- 1 to 2;
        } println (s"i=$i  j=$j k=$k")                                      
                                                                

Now the semicolon is optional. you can remove that.

                                                    
    for {   i <- 1 to 3 
            j <- 1 to 2 
            k <- 1 to 2
        } println (s"i=$i  j=$j k=$k")                                      
                                                                    

The Guards in Scala for loop

For each generator, you can apply filters. Let me show you.

                                                        
    for {   i <- 1 to 3 
            j <- 1 to 2; if(j%2==0)
            k <- 1 to 2
        } println (s"i=$i  j=$j k=$k")                                          
                                                                        

The above example applies a filter on the on the generator for the j. I am putting a filter to eliminate odd numbers. Only those elements of j will pass through that qualify the filter condition. You can also write one expression on each line.

                                                            
    for {   i <- 1 to 3 
            j <- 1 to 2 
            if(j%2==0)
            k <- 1 to 2
        } println (s"i=$i  j=$j k=$k")                                          
                                                                            

Assignments in Scala for loop

Let's take a better example to understand the power and flexibility of the Scala for loop. I have one file with the content shown below.
Save the content as employees.csv

                                                                
    employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,commission_pct,manager_id,department_id
    100,Steven,King,SKING,515.123.4567,17-JUN-1987,AD_PRES,24000,NULL,NULL,90
    101,Neena,Kochhar,NKOCHHAR,515.123.4568,21-SEP-1989,AD_VP,17000,NULL,100,90
    102,Lex,De Haan,LDEHAAN,515.123.4569,13-JAN-1993,AD_VP,17000,NULL,100,90
    103,Alexander,Hunold,AHUNOLD,590.423.4567,03-JAN-1990,IT_PROG,9000,NULL,102,60
    104,Bruce,Ernst,BERNST,590.423.4568,21-MAY-1991,IT_PROG,6000,NULL,103,60
    105,David,Austin,DAUSTIN,590.423.4569,25-JUN-1997,IT_PROG,4800,NULL,103,60
    106,Valli,Pataballa,VPATABAL,590.423.4560,05-FEB-1998,IT_PROG,4800,NULL,103,60
    107,Diana,Lorentz,DLORENTZ,590.423.5567,07-FEB-1999,IT_PROG,4200,NULL,103,60
    108,Nancy,Greenberg,NGREENBE,515.124.4569,17-AUG-1994,FI_MGR,12000,NULL,101,100
    109,Daniel,Faviet,DFAVIET,515.124.4169,16-AUG-1994,FI_ACCOUNT,9000,NULL,108,100                                             
                                                                                

The data in the file contains the list of employees. Let's create a for loop to print all the records.

                                                                
    import scala.io._
    for (line <- Source.fromFile("employees.csv").getLines().toList) println(line)                                          
                                                                                

Simple. Isn't it. But I am not interested in all those fields. I just want to see id, first name, last name, and the department id. Can we print only these fields. Let's do it.

                                                                    
    for (line <- Source.fromFile("employees.csv").getLines().toList) { 
        val fields = line.split(",")
        println(fields(0)+"--"+fields(1)+"--"+fields(2)+"--"+fields(10))
    }                                               
                                                                                    

So, we split the line over a comma. Take only four fields. We did that in the body of the for loop. The for loop also allows you to embed assignment as part of the generator. Same as we embed a filter for the generator. So, the code should look like this.

                                                                        
    for {   line <- Source.fromFile("employees.csv").getLines().toList
            fields = line.split(",")
        } println(fields(0)+"--"+fields(1)+"--"+fields(2)+"--"+fields(10))                                         
                                                                                        

So, the idea is to expand the for comprehension instead of expanding the body of the for loop. Keep the body as simple as possible.
Let me ask you a question. I am only interested in department number 60. Can you filter out others?
Here is the code.

                                                                            
    for {   line <- Source.fromFile("employees.csv").getLines().toList
            fields = line.split(",")
            if(fields(10)=="60")
        } println(fields(0)+"--"+fields(1)+"--"+fields(2)+"--"+fields(10))                                      
                                                                                            

So, it is not necessary to apply a filter before you apply an assignment. You have all the freedom. The above example looks like an SQL query.

                                                                                
    select employee_id, first_name, last_name, department_id
    from employees
    where department_id=60;                                             
                                                                                                

Can you extend the above example further? Instead of showing the department Id, can we get the department name? Basically, I want to see all the employees with their department Id replaced with their department names. You have a separate file for departments.
Save the below content as departments.csv

                                                                                    
    department_id,department_name,manager_id,location_id
    90,Administration,200,1700
    20,Marketing,201,1800
    60,Purchasing,114,1700
    40,Human Resources,203,2400
    100,Shipping,121,1500                                           
                                                                                                    

Does the requirement look like another SQL query? Join two files and get the department name. Here is the Scala code.


    val emp = for {  dept <- Source.fromFile("departments.csv").getLines().toList
                     deptfields = dept.split(",")
                     emp <- Source.fromFile("employees.csv").getLines().toList
                     empfields = emp.split(",")
                     if(deptfields (0) == empfields(10))
                  } yield(empfields(0),empfields(1),empfields(2),deptfields(1))
    emp foreach println             

We used a Scala nested loop. This example may not be the best way to solve that problem. But I just wanted to show you the power and flexibility of the Scala for expression.

Summary

We started the discussion with below structure.


    for ( seq ) yield { expr }             
                    

Now, you know that the seq may have three components.

  1. A Generator
  2. A Filter or Guard
  3. A Definition

You need a generator to create a for loop in Scala. That generator is a Scala collection. Without a generator, you can’t create a Scala loop.
The filter and the definition are optional. You can have it if you need it.
If you want a nested loop, you can repeat the entire set for next level of iteration.
You can use the yield if you want to return the output as another collection. The yield is also optional. In the absence of yield, Scala for loop behaves like a foreach method on a collection.
Scala for loop is just a syntactic sugar. You can model any for expression using a combination of following collection methods. That’s what the Scala compiler internally does.

  1. foreach
  2. map
  3. flatMap
  4. withFilter

Thank you for watching Learning Journal.
Keep Learning and Keep Growing.


You will also like:


Pattern Matching

Scala takes the credit to bring pattern matching to the center.

Learning Journal

Anonymous Functions

Learn Scala Anonymous Functions with suitable examples.

Learning Journal

Referential Transparency

Referential Transparency is an easy method to verify the purity of a function.

Learning Journal

Statements and Expressions

Statements and Expressions in Scala. How are they different?

Learning Journal

Lazy Evaluations

Evaluate the expression now vs evaluate it for the first use. Strict vs Lazy?

Learning Journal