Quantitative skills for biology
1 Overview
Many biology-relevant skills are now performed using computers: statistical analyses, mathematical modelling, managing and reformatting data, visualizing data, identifying genes from DNA sequences, constructing 3D models of proteins, and making phylogenies. Quantitative skills and best practices in data science are now being incorporated throughout the biology major.
Teaching quantitative skills is necessary since a survey reported that more than 90% of ecologists found that, in retrospect, their training in statistics and mathematics, during their undergraduate biology major, was too low (Barraquand et al. 2014). This same study found that practicing ecologists recommended that 30% of coursework during a biology major be dedicated to quantitative training.
Today’s biologists use various software, particularly Microsoft Office, R, Python, and ArcGIS. An advantage of scripting languages such as R, rather than point and click languages, such as Microsoft Excel, is that analyses are reproducible because the sequence of commands are saved in a file. When graphs are produced by clicking various buttons in a software, these steps are not recorded: errors maybe difficult to identify, and the steps may be forgotten.
This quantitative training manual teaches data manipulation and visualization in R. Our choice of R is for the following reasons:
A valued skill: As of March 2019, R is the fifth ranked software listed in job advertisements for Data Scientists. Python is the number one ranked software, but R is a popular statistical software and is widespread in biology due to the need to analyze the results of experiments using statistics.
Reproducibility: R is a scripting software, so unlike point and click softwares, the steps to produce a data analysis or figure are fully reproducible.
Accessibility: R is free, unlike many statistical software packages that are quite expensive. This means that researchers or organizations, such as environmental NGOs, can still do sophisticated analysis with limited budgets.
No limits: R has a large user community, and contributed packages allow you to do almost anything. R packages range from serious to fun).
Popularity: A survey of studies from 30 ecology journals found that in 2017, 58% of articles used R as the primary tool to generate their results (Lai et al. 2019).
The quantitative training program at MUNL was launched in BIOL 1001 in Spring 2020 during the COVID-19 pandemic.
1.1 Course links
BIOL 1001 - Principles of Biology
BIOL 1002 - Principles of Biology
BIOL 2600 - Ecology (in progress)
BIOL 2900 - Principles of Evolution & Systematics
BIOL 3295 - Population and Evolutionary Ecology
BIOL 4505 - Systematics and Biogeography
BIOL 4606 - Bioinformatics: Biological Data Analysis
BIOL 4607 - Models in Biology
BIOL 4605/7220 - Quantitative Methods in Biology: RIntro; Toolbox for labs
BIOL 4630 - Mammalogy
BIOL 4651 - Conservation: (GIS Module); (Quantitative Methods Module) (Interactive maps)
1.2 Instructor guide
If you are a biology instructor who is interested in furthering quantitative training in your course you can download a guide here (please download the file as viewing it does not display all the information).
1.3 R help center
If you need help with R and Rstudio you can talk to a Quant TA in the biology computer Lab (CSF2218) or the help center (CSF2342). You can assess the help center schedule here.