14 Loops and repeating commands

14.1 Introduction

Loops are used in programming to repeat a specific block of code. Similar to writing functions, this approach will reduce repetition and keep your code organized.

14.2 for loops

A for loop is used to iterate over a vector in R programming. Iteration helps apply the same task to multiple inputs, in other words, for loops perform the same action for each item in a list.

The syntax of loops is relatively simple - for each value listed within the for() component, the operation stated within the {} is performed.

for(value in sequence)
{
statement
}

In the for loop below, we have specified a vector containing the numbers 1 to 5. The statement within the {} will use the print function to return the value of i with each loop or iteration. In this example, the loop iterates 5 times as the vector within for() contains 5 elements.

for (i in 1:5) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Instead of creating the vectors within for(), we can also use existing vectors. We first assign the vector we are looping over before calling the loop. In the example below, the loop will return the squared value for each term within x.

x <- 1:5
for (i in x) {
  print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25

Often we will want to save the output from our loop. The first option is to make a blank vector or data frame and save the results to it.

In the example below, we first create an empty vector x. The for loop first calculates the square of the numbers 1 to 10 and then saves them to vector x.

x <- vector() 

for (i in 1:10) {
  y <- i^2
  x <- c(x, y)
}

x
##  [1]   1   4   9  16  25  36  49  64  81 100

If you know the dimensions of your data, you can also make a blank vector or dataframe of a specified size. This can help increase processing speeds for more complicated loops. For example, if we had a loop with 10 elements, we could store the results of each operation in a vector with a length of 10

Here is code that will store both the square and the square root of the numbers 1 to 10 in two columns of a new dataframe called x2

x2 <- data.frame(col1 = vector(length = 10), col2 = vector(length = 10)) 

for (i in 1:10) {
  col1 <- i^2 
  col2 <- sqrt(i) 
  x2[i, 1] <- col1 
  x2[i, 2] <- col2 
}

x2
##    col1     col2
## 1     1 1.000000
## 2     4 1.414214
## 3     9 1.732051
## 4    16 2.000000
## 5    25 2.236068
## 6    36 2.449490
## 7    49 2.645751
## 8    64 2.828427
## 9    81 3.000000
## 10  100 3.162278

Now let’s say we are given a dataset that includes three experiments across four different sites.

data
##    site experiment length width height
## 1     1          1    2.2   1.3    9.6
## 2     1          2    2.1   2.2    7.6
## 3     1          3    2.7   1.5    2.2
## 4     2          1    3.0   4.5    1.5
## 5     2          2    3.1   3.1    4.0
## 6     2          3    2.5   2.8    3.0
## 7     3          1    1.9   1.8    4.5
## 8     3          2    1.1   0.5    2.3
## 9     3          3    3.5   2.0    7.5
## 10    4          1    2.9   2.7    3.2
## 11    4          2    4.5   4.8    6.5
## 12    4          3    1.2    NA    2.7

We are then asked to calculate the correlation between shrub length and height for experiment 1 and 2. We first create a vector that specifies these experiment numbers and then create an empty vector that will be used to store the correlation results.

exp_num <- c(1, 2)

cor_output <- c()

Within our for() loop we will need to include several steps. First we subset our data using the values previously defined in exp_num above and store as a variable, which we call exp_subset. We then calculate the correlation between length and height from exp_subet. Next we create a dataframe that will include both the experiment number as well as the correlation results and finally, we use the function rbind to bind the rows from df_cor with our empty vector.

for(i in exp_num){
  exp_subset <- subset(data, experiment == exp_num[i])
  cor_val <- cor(exp_subset$length, exp_subset$height) 
  df_cor <- data.frame(experiment = i,
                       correlation = cor_val)
  cor_output <- rbind(cor_output, df_cor) 
}
cor_output
##   experiment correlation
## 1          1  -0.6332787
## 2          2   0.4844752

TRY IT How out would rewrite the above code if you wanted to instead subset for sites 2 and 4?

14.3 Loop alternatives

Loops, specifically for-loops, are essential to programming in general. However, in R, you should avoid them when there are built-in functions that already exist. These built-in functions are often more efficient ways of doing things rather than writing a loop yourself.

One option is to use the apply() family of functions. These functions manipulate data from matrices, arrays, lists and dataframes in a repetitive way while avoiding explicit use of loops. Here is an example of some of the more common functions within the apply() family:

  • apply() - Applies a function over over rows or columns of an array or matrix.
  • lapply() - Apply a function over a vector or list and returns a list.
  • sapply() - Similar to lapply() but can simplify the result to a vector or matrix.