4 Introduction to R
This is a brief introduction to R, focusing on basic programming concepts and best practices.
4.1 Workflow
You can interact directly with the console, but it is considered best practice to write an .R file and run commands in the console from that script. This allows saving the code and running all at once with the source
function. Remember to click Save often, so that you actually save your script!
Once you open an R script (or any file), an editor panel will appear in the top left. From there you can run the current line (or a selected block of code) by clicking on the Run
button in the editor panel, or in the Code
menu select Run Lines
. There is also a Re-run
button next to the Run
button, which allows to run again the previous code block including any new modifications you have just made.
4.2 Working directory
The working directory is a folder on your computer where R will look for data, save your plots, etc. To make your workflow easier, it is good practice to save everything related to one project in the same place, as it will save you time typing up computer paths or finding the files across your computer. For example, you could save your script and all the data for this exercises in a folder called “Intro-to-R”. It is good practice to avoid spaces in file names because it can confuse R.
To find out where your working directory is now, run the function getwd()
. If you want to change it, you can use setwd("dir")
, where "dir"
is the path to the folder that you want to use. Notice that the path must be written inside quotation marks. You’ll see that in the console R runs the command for changing the working directory.
Alternatively, you can set the working directory from the Files tab in the Files/Plot/Packages/Help pane. For this, navigate to the folder whence you want to read and save files or create a new one by clicking New Folder
. Once you have located the folder, click on it to access it. Then, click More
and select Set As Working Directory
.
TRY IT! Set the working directory to a folder in your Desktop called “Intro-to-R”.
4.3 Variables and assignment
The assignment operator is <-
. You create this by typing the “less than” sign (the <
symbol located above the comma) followed by a hyphen. It stores whatever you write on the right to variables on the left. Variable names can be made up of letters, numbers, underscores and periods. But, they cannot start with a number nor contain spaces. For instance:
Here, the variable x
has been assigned the value 10
. Notice that assigning a variable does not print its value, but if you look at the Environment
panel, you will see the new variable there. You can also check that it is stored by calling the variable:
## [1] 10
TRY IT! Define a variable “y” with the value of 3.
TIP Try to give your variables meaningful names. For example, if you know you are working with variables that represent different measurements you took on a plant, you might name your variables things like leaf_length
or stem_diameter
4.3.1 Shortcuts
From the console, you can go back to the previous command by hitting the up
arrow. If you press the up
arrow multiple times you will walk back through the history. You can navigate the history by pressing the down
arrow as well. You can edit any command from there by using the left
arrow or delete
keys.
4.4 Using variables
Variables can be used in calculations
## [1] 20
Variables can also be reassigned
## [1] 314
4.6 Precedence
The order of operations follows the standard precedence (from highest to lowest):
- Parentheses
()
- Exponents or roots:
^
- Multiplication or division:
*
,/
- Addition or subtraction:
+
,-
TRY IT! Using R, can you calculate if the result of (1 + 2) * (3 * 5)
is the same as (1 + 2) * 3 * 5
?
4.7 Mathematical functions
R comes with many built in mathematical functions. Functions are called by typing its name followed by open and closing parentheses. Whatever we write inside the parentheses are the function’s arguments:
## [1] 17.72005
TRY IT! Calculate the natural logarithm of x
. Hint: the function to do so is log()
, and the argument would be the name of the variable that we want to use.
4.8 Data types
There are 6 basic data types in R:
- Character:
h
orhello
. - Numeric (real or decimal):
3
or315.5
. - Integer:
4L
(the letter L specifies that this is an integer). - Logical:
TRUE
orFALSE
. - Complex:
7+4i
.
4.9 Data structures
Elements of these data types may be combined to form data structures. We will look at three of them.
Vectors
A vector is a set of values, in a specific order, and of the same data type. To create a vector we can use the function c()
, where the “c” stands for concatenate or combine. For example:
## [1] "red" "green" "blue"
## [1] 2 3 5 7
To check what kind of data object it is you can use the function class
. To see check how long is the object you can use length()
for one-dimensional objects, or dim()
for objects of two or more dimensions. For example:
## [1] "character"
## [1] 3
TRY IT! What is the data type and length of the vector y
?
HAND IN Tell us what the type and length of vector y
is.
The core idea behind vectorization is that operations can be applied at once to an entire set of values. For example:
## [1] 6 9 15 21
TRY IT! Create a vector with a sequence of numbers using the function seq
. Hint: run ?seq
in the console to display the help page of the function. Remember, you can find a guide to the help files in Chapter 5
HAND IN Copy the line of code you used to create a vector with a sequence of numbers from 0-20, increasing by 5.
A key point of the previous example is that vectors can be used as arguments to functions. But, vectors are atomic, which means that they can only contain one class of data. In other words, different classes cannot be mixed in the same object. For instance, if we combine numeric with character data types, R will coerce all elements to be characters:
## [1] "character"
For this reason, and to avoid errors, it is important to check frequently the data type of the objects that you are working with.
Matrices
In R, matrices are a type of vectors with two dimensions: the number of rows and columns. As with vectors, the elements of a matrix must be of the same data type (numeric or character). One way of creating a matrix with the function matrix()
. Keep in mind that matrices are filled column-wise:
## [,1] [,2]
## [1,] 4 6
## [2,] 5 7
We can confirm the type of data with the function class()
, and look at the size of the matrix with the function dim()
:
## [1] "matrix" "array"
## [1] 2 2
TRY IT! Create a 3 by 3 matrix in which the elements are the numbers from one to nine.
HAND IN Hand in a copy of the code you used to create the above matrix. Copy-paste the resulting matrix from the console into the Word document.
Lists
A list is a special type of vector, where each element can be of a different data type. To create a list we can use the function list()
:
## [[1]]
## [1] 5
##
## [[2]]
## [1] "hello"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+4i
## [1] "list"
As you can see from the output, each element of the list starts on a new line. Lists can be useful inside functions because you can combine different kinds of results into a single object that a function can return.
TRY IT! What is the length of the object my_list
?
HAND IN State the length of the object my_list
Data frames
A data frame is a type of list in which every element of the list has the same length, that means that a data frame is a “rectangular” list. We can create a data frame with the function data.frame()
:
## id x y
## 1 a 1 11
## 2 b 2 12
## 3 c 3 13
## 4 d 4 14
## 5 e 5 15
## 6 f 6 16
## 7 g 7 17
## 8 h 8 18
## 9 i 9 19
## 10 j 10 20
To get some information on our data frame, we can use again the function dim()
.
TRY IT! Create a data frame as specified above.
HAND IN Write down what information you can get from the functions nrow()
, ncol()
, head()
, and tail()
that describe the data frame df
you just created or cut and past from console into word document.
4.10 R Packages
Functions can be added to R with packages, which are stored on CRAN (the Comprehensive R Archive Network).
To install a package you use the function install.packages("packagename")
, where packagename
is the name of the package you want to install (NOTE you need to include the ” ” marks when you type the function. To see what packages are installed by type installed.packages()
. Packages are loaded into the R session with the library(packagename)
function.
As an alternative, you can install a package through the Packages tab in the the Files/Plots/Packages/Help pane. For this, click on Install
, this feature will open a window with a blank field where you can type the name of the package. When you are ready click on Install
to proceed with the installation. You will see in the console that R displays some information about where the package is being downloaded from and a progress bar. You can also load a package from this tab. For this, find the the package that you want to load either by scrolling trough the list or by writing its name on the search bar to the right of the tab. Once you find it, click on the checkbox to the left of the package name. You will see that R runs the command to load the package in the console.
TRY IT! Install the package ggplot2
, by either using the install.packages
command or the Install
feature. Then, load the package either with the library()
or the checkbox.
4.5 Comments
In R, the
#
sign is a special character that indicates a comment. Anything that is written directly to the right of a#
sign will be ignored by R. Comment your scripts often! You can use a whole line o####
symbols to create blocks or headings to make your script easier to read.HAND IN What does the assignment operator do? What is the command for the assignment operator? How do you write a comment in your code?