2 PART I: Managing biological data

Getting started

  • Review best practices for guidance on how to organize your files.

  • Download, to an appropriate folder, the following .csv files from Brightspace: You can also access these data from here.

    • shrub-volume-data.csv

    • max-temperature.csv

    • surveys.csv

  • Open and save a new R Script. Complete the information at the start of the R Script template.

  • Set your working directory

  • Install and load the dplyr R package.

If you’re unsure how to install a package, see R Packages.

Throughout these instructions are links to the relevant sections in Quantitative Skills for Biology.

If you do not complete these steps, you will not be able to complete the HAND IN questions.

TO HAND-IN You are to hand in a .R file, formatted as shown in Best Practices - Template where the lines of code to produce the output asked for under HAND-IN are produced in the order they are asked for.


2.1 Shrub Volume Data Basic

After writing the command library(dplyr) in your R script to attach dplyr, next import the data. If you use the programmatic way (i.e., command line), your command may look something like this:

ShrubData <- read.csv("lab_data/shrub-volume-data.csv")

The exact command will depend on where you saved shrub-volume-data-data.csv and how you have set your working directory. You may call your dataset ShrubData or give it a different name.

You will need this command in your R script or you will not be able to complete the HAND-IN questions that appear next.

HAND-IN. [Q1] Add a line of code to your R script where you use the select() function to select the site and experiment columns of shrub-volume-data.csv (note: that above we named this file, ShrubData: you many have given it a different name). To learn how to use the select() function click on the select() link above.

HAND IN. [Q2] Add a line of code to your R script where you use the mutate() function to create a column called area. (Reminder, the area of a rectangle is equal to the length * width).

HAND IN. [Q3] Add a commented out line of code to your R script that answers the following question: does the arrange() function: (a) reorder all the rows in ShrubData based on the ascending order of the value in the width column, or (b) reorder only the width column? (for example, write # Q3. (a) in your R script if your answer is (a).)

HAND IN. [Q4] Add a line of code to your R script where you use the arrange() function to sort the data based on the length column.

HAND IN. [Q5] Add a line of code to your R script, where you use the filter() function to filter the data to only include plants with length longer than 5 cm.

HAND IN. [Q6] Add a line of code to your R script, where you use the filter() function to show all entries with plants with length greater than 4 cm and width greater then 2cm.

HAND IN. [Q7] Add a line of code to your R script, where you use the filter() function to filter the data to only include plants from experiment 1 or 2.

HAND IN. [Q8] Add a line of code to your R script where you use the filter() function remove the rows with NA from your data.

2.2 Mean maximum monthly temperature in Bay D’Espoir

In this section, we want you to use max-temperature.csv which you have already downloaded from Brightspace.

Metadata Overview The data are from Bay D’Espoir GEN STN. and are originally sourced from Environment and Climate Change Canada. Bay D’Espoir general station is located in Newfoundland at -55.48 W Longitude and 47.59 N latitude. The station Climate ID number is 8400413 . The dataset provided, max-temperature.csv, is a cleaned version of the data download.

Columns in the data frame

Column 1: Year - The year of the recording

Column 2: Month - The month of the recording, i.e. 1=January; 2=February; etc

Column 3: Day - The day of the recording

Column 4: Max.Temp.C - The maximum temperature recorded on that day in degrees Celcius.

HAND IN. [Q9] In your R Script, indicate with a comment that you are starting a new section ## Maximum monthly temperature. Add lines of code to your R Script where you:

  • load the data file: max_temperature_data.csv

  • remove the NAs from the dataset using filter().

  • calculate the mean maximum temperature in each month across years using group_by() and summarize() as described in Group summary

  • plot the monthly maximum temperature using plot().

2.3 Bird Banding

The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following vector.

number_of_birds <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 
                     36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32, 
                     900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92, 
                     83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
                     102, 273, 600, 10, 11)

Counts are entered in order and sites are numbered starting at one.

In your R Script, indicate with a comment that you are starting a new section ## Bird banding. Review the section Useful functions.

  • Add lines of code to your R Script where you provide the commands to answer the questions below.

  • In a comment after each line of code for each question, write a comment giving the numerical answer, i.e., # the number of sites is 10.

HAND IN. [Q10] How many sites are there?

HAND IN. [Q11] What is the total number of birds counted across all of the sites?

HAND IN. [Q12] What is the smallest number of birds counted?

HAND IN. [Q13] What is the largest number of birds counted?

HAND IN. [Q14] What is the mean number of birds seen at a site?

HAND IN. [Q15] How many birds were counted at the last site? Have the computer choose the last site automatically in some way, not manually entering its position as a number. Do you know a function that will give you a position of the last value? (since positions start at 1 position of the last value in a vector is the same as its length).

HAND IN. [Q16] How many birds were counted at site 42?

2.4 Portal Data Manipulation

In this section, we want you to use surveys.csv, which you have already downloaded from Brightspace.

In your R Script, indicate with a comment that you are starting a new section ## Portal data.

HAND IN. [Q17] Add lines of code to your R script to load the file surveys.csv.

HAND IN. [Q18] Add lines of code to your R script where you use filter() to create a new data frame survey_new where you have removed all rows with NA in the weight column.

HAND IN. [Q19] Add lines of code to your R script where you use the select() function to create a new data frame called survey_new_2 that only contains the columns year, month, day, species_id and weight in that order.

HAND IN. [Q20] Add lines of code to your R script where you use mutate() to calculate weight_kg, which is weight in kilograms, and add this new column to survey_new_2, where the result is named survey_new_3. The weight in survey_new_2 was given in grams, so you will need to convert weight into kilograms. (Recall that 1 g = 0.001 kg.).

HAND IN. [Q21] Add lines of code to your R script, where you use filter() to make a new data frame survey_new_4 which is the same as survery_new_3 except that only the rows with species_id =="SH" are included.

HAND IN. [Q22] Add a line of code to your R script where you print the first 6 rows of survey_new_4 using the head() function.

2.5 Using pipes

In your R Script, indicate with a comment that you are starting a new section ## Using pipes.

HAND IN. [Q23] Add lines of code to your R script were you produce a data frame survey_new_5 from surveys.csv, which is the same as survey_new_4, but where you use the pipe %>% operation and do not create the intermediaries survey_new_2, survey_new_3, and survey_new_4.

HAND IN. [Q24] Add a line of code to your R script, where you print the first 6 rows of survey_new_5 using the head().