2 PART I: Managing biological data
Getting started
Review best practices for guidance on how to organize your files.
Download, to an appropriate folder, the following .csv files from Brightspace: You can also access these data from here.
shrub-volume-data.csv
max-temperature.csv
surveys.csv
Open and save a new R Script. Complete the information at the start of the R Script template.
Set your working directory
Install and load the dplyr R package.
If you’re unsure how to install a package, see R Packages.
Throughout these instructions are links to the relevant sections in Quantitative Skills for Biology.
If you do not complete these steps, you will not be able to complete the HAND IN questions.
TO HAND-IN
You are to hand in a .R
file, formatted as shown in Best Practices - Template where the lines of code to produce the output asked for under HAND-IN are produced in the order they are asked for.
2.1 Shrub Volume Data Basic
After writing the command library(dplyr)
in your R script to attach dplyr
, next import the data. If you use the programmatic way (i.e., command line), your command may look something like this:
<- read.csv("lab_data/shrub-volume-data.csv") ShrubData
The exact command will depend on where you saved shrub-volume-data-data.csv
and how you have set your working directory. You may call your dataset ShrubData
or give it a different name.
You will need this command in your R script or you will not be able to complete the HAND-IN questions that appear next.
HAND-IN. [Q1] Add a line of code to your R script where you use the select()
function to select the site
and experiment
columns of shrub-volume-data.csv
(note: that above we named this file, ShrubData
: you many have given it a different name). To learn how to use the select()
function click on the select()
link above.
HAND IN. [Q2] Add a line of code to your R script where you use the mutate()
function to create a column called area
. (Reminder, the area of a rectangle is equal to the length * width).
HAND IN. [Q3] Add a commented out line of code to your R script that answers the following question: does the arrange()
function: (a) reorder all the rows in ShrubData
based on the ascending order of the value in the width
column, or (b) reorder only the width
column? (for example, write # Q3. (a)
in your R script if your answer is (a).)
HAND IN. [Q4] Add a line of code to your R script where you use the arrange()
function to sort the data based on the length
column.
HAND IN. [Q5] Add a line of code to your R script, where you use the filter()
function to filter the data to only include plants with length longer than 5 cm.
HAND IN. [Q6] Add a line of code to your R script, where you use the filter()
function to show all entries with plants with length greater than 4 cm and width greater then 2cm.
HAND IN. [Q7] Add a line of code to your R script, where you use the filter()
function to filter the data to only include plants from experiment 1 or 2.
HAND IN. [Q8] Add a line of code to your R script where you use the filter()
function remove the rows with NA
from your data.
2.2 Mean maximum monthly temperature in Bay D’Espoir
In this section, we want you to use max-temperature.csv
which you have already downloaded from Brightspace.
Metadata Overview The data are from Bay D’Espoir GEN STN. and are originally sourced from Environment and Climate Change Canada. Bay D’Espoir general station is located in Newfoundland at -55.48 W Longitude and 47.59 N latitude. The station Climate ID number is 8400413 . The dataset provided,
max-temperature.csv
, is a cleaned version of the data download.Columns in the data frame
Column 1: Year - The year of the recording
Column 2: Month - The month of the recording, i.e. 1=January; 2=February; etc
Column 3: Day - The day of the recording
Column 4: Max.Temp.C - The maximum temperature recorded on that day in degrees Celcius.
HAND IN. [Q9] In your R Script, indicate with a comment that you are starting a new section ## Maximum monthly temperature
. Add lines of code to your R Script where you:
load the data file:
max_temperature_data.csv
remove the
NA
s from the dataset usingfilter()
.calculate the mean maximum temperature in each month across years using
group_by()
andsummarize()
as described in Group summaryplot the monthly maximum temperature using
plot()
.
2.3 Bird Banding
The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following vector.
<- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27,
number_of_birds 36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32,
900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92,
83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
102, 273, 600, 10, 11)
Counts are entered in order and sites are numbered starting at one.
In your R Script, indicate with a comment that you are starting a new section ## Bird banding
. Review the section Useful functions.
Add lines of code to your R Script where you provide the commands to answer the questions below.
In a comment after each line of code for each question, write a comment giving the numerical answer, i.e.,
# the number of sites is 10.
HAND IN. [Q10] How many sites are there?
HAND IN. [Q11] What is the total number of birds counted across all of the sites?
HAND IN. [Q12] What is the smallest number of birds counted?
HAND IN. [Q13] What is the largest number of birds counted?
HAND IN. [Q14] What is the mean number of birds seen at a site?
HAND IN. [Q15] How many birds were counted at the last site? Have the computer choose the last site automatically in some way, not manually entering its position as a number. Do you know a function that will give you a position of the last value? (since positions start at 1 position of the last value in a vector is the same as its length).
HAND IN. [Q16] How many birds were counted at site 42?
2.4 Portal Data Manipulation
In this section, we want you to use surveys.csv
, which you have already downloaded from Brightspace.
In your R Script, indicate with a comment that you are starting a new section ## Portal data
.
HAND IN. [Q17] Add lines of code to your R script to load the file surveys.csv
.
HAND IN. [Q18] Add lines of code to your R script where you use filter()
to create a new data frame survey_new
where you have removed all rows with NA
in the weight
column.
HAND IN. [Q19] Add lines of code to your R script where you use the select()
function to create a new data frame called survey_new_2
that only contains the columns year
, month
, day
, species_id
and weight
in that order.
HAND IN. [Q20] Add lines of code to your R script where you use mutate()
to calculate weight_kg
, which is weight in kilograms, and add this new column to survey_new_2
, where the result is named survey_new_3
. The weight in survey_new_2
was given in grams, so you will need to convert weight
into kilograms. (Recall that 1 g = 0.001 kg.).
HAND IN. [Q21] Add lines of code to your R script, where you use filter()
to make a new data frame survey_new_4
which is the same as survery_new_3
except that only the rows with species_id =="SH"
are included.
HAND IN. [Q22] Add a line of code to your R script where you print the first 6 rows of survey_new_4
using the head()
function.
2.5 Using pipes
In your R Script, indicate with a comment that you are starting a new section ## Using pipes
.
HAND IN. [Q23] Add lines of code to your R script were you produce a data frame survey_new_5
from surveys.csv
, which is the same as survey_new_4
, but where you use the pipe %>%
operation and do not create the intermediaries survey_new_2
, survey_new_3
, and survey_new_4
.
HAND IN. [Q24] Add a line of code to your R script, where you print the first 6 rows of survey_new_5
using the head()
.