C Hypothesis, predictions and equations

The proposed answer to a question is termed a hypothesis. Hypotheses are formulated on the basis of some preliminary observations and/or prior knowledge.

How does the presence of a concentration gradient affect the diffusion of water? There are three possible answers to this question. Indeed, many questions may generate multiple alternative hypotheses.

(we are considering changing these examples to be about moose to make them more fun)

Hypotheses

H1. Water diffuses from a hypertonic solution into a hypotonic solution.

H2. Water diffuses from a hypotonic solution into a hypertonic solution.

H3. Water diffuses equally in both directions along the osmotic gradient.

All three meet the basic requirement of a hypothesis, namely, they are predictive, are clearly expressed and are an answer to the problem; however, #2 is the most likely hypothesis given background reading on osmosis.

Predictions are generated from the hypothesis and framed in terms of a specific experiment or analysis. Let’s define a specific experiment:

EXPERIMENT 1

An osmometer containing a 30% glucose and dye solution is placed in a beaker containing distilled water.

If hypothesis #2 holds, then true water will diffuse from the hypotonic solution (the water in the beaker) into the hypertonic solution (the 30% glucose and dye solution in the osmometer) and the water will rise in the pipette. Note how this prediction was generated using the if … then … format.

i.e. if the hypothesis holds true, then a specific event will occur (prediction).

The prediction corresponding to H2 for Experiment 1 is:

if water diffuses from a hypotonic solution into a hypertonic solution (hypothesis),
then water will diffuse from the beaker into the osmometer causing the water in the pipette to rise.

Although hypotheses and predictions are similar, two basic differences exist: hypotheses are general, predictions are specific; hypotheses are written in the present tense, predictions in the future tense.

Equations write the predictions in mathematical notation so that statistical tests can be applied to determine if the hypothesis is supported. The equation corresponding to the prediction in the box above is:

The equation corresponding to the prediction above is:

\(v_F > v_0\)

where \(v_F\) is the final volume of the glucose-dye solution in the osmometer (units: mL), and
\(v_0\) is the initial volume of glucose-dye solution in the osmometer (units: mL).

When writing your prediction as an equation you get to choose the notation: you might have instead decided that y is the final volume of the glucose-dye solution; this choice is also correct if you correctly explain what variable is represented by what you have chosen to call y. You many choose any lower or upper case Arabic (i.e., a,b,c…) or greek (i.e., α, β, γ, …) letters for your variables and you may also choose to use subscripts.
Choose symbols that are one letter long. For example, \(vF\) (two symbols long) is a less preferred symbol choice because this could be confused with \(v\) multiplied by \(F\).
It is usual to choose symbols that are intuitive (i.e. here, \(v\) helps us think of volume and \(F\) helps us think of ‘final’). Commonly, μ or bar notation, i.e., \(\bar{\nu}\) represents the mean of multiple observations. It is a good idea to avoid “O” which can be confused with zero, and e, i, π and Σ, which have specific definitions in mathematics, however, these considerations are preferred, an answer that does not use these conventions is not necessarily wrong. The necessary element of your answer is that you provide a description of the notation you define.
You must provide the units for the quantities you define.
The dependent variable (see below for a defintion) is to appear on the lefthand side of the equals or inequality.

You will not perform statistical tests in BIOL 1001 or 1002. However, learning how to write your predictions as an equation will help you to understand how your hypothesis and predictions connect to statistical tests that help determine whether a hypothesis is supported or refuted.

C.1 Discrete and continuous variables

Generally, a hypothesis describes the effect of one factor on another, for example, the effect of a concentration gradient on the direction of osmosis. These factors are called variables and are of two types: the variable that is changed (varied) by the experimenter is called the independent variable, and the variable that changes as a consequence of the changes to the independent variable is called the dependent variable. In our example:

The glucose concentration in the osmometer is the independent variable because it is created by the experimenter, and
The volume of water in the pipette is the dependent variable because it is affected directly by the concentrations inside and outside the osmometer, and is the measured response.

EXPERIMENT 2

This experiment will test how concentrations of 30, 60, and 90% glucose in an osmometer will influence the flow of water molecules across two solutions.

For Experiment 2, we might have the following as our hypothesis, prediction and equation:

Hypothesis: When a solution is more hypertonic, more water will diffuse into it from a hypotonic solution.

Prediction: The volume of water in the osmometer will be largest for the 90% glucose solution, and smallest for the 30% glucose solution.

Equation:

\(v_F = bg + v_0\),

where \(v_F\) is the final volume of liquid in the osmometer (mL), \(b\) is the change in final volume per unit glucose concentration (mL per %), g is the glucose concentration (%), and \(v_0\) is the final volume when the glucose concentration is zero (mL).

Note that in Experiment 2, the independent variable, glucose concentration, takes three different values, (30, 60 and 90%) and these values are taken from options along a continuum. As such, in the above equation we have treated glucose concentration as a CONTINUOUS variable. This changes our hypothesis and predictions. Let’s examine the new equation more closely.

We predicted an increasing relationship between glucose concentration and the final volume of water in the osmometer. In writing our equation, we, more specifically, assumed a linear, or “straight-line”, relationship. An increasing relationship does not have to be a linear relationship, however, without a good reason to assume otherwise, we should assume that a prediction stating an increasing or decreasing relationship between two variables corresponds to a linear (or “straight-line”) relationship with where an increasing relationship has a positive slope and a decreasing relationship has a negative slope.
Recall that the equation for a straight-line is: \(y = m x + b\), where \(y\) corresponds to the vertical axis of a graph, \(x\) corresponds to the horizontal axis, \(m\) is the slope of the line and \(b\) is the y-axis intercept (corresponding to \(x = 0\)). Note that the equation above is simply the equation for a straight-line written in the notation of our experiment:
- \(v_F\), the dependent variable, is \(y\),
- \(b\), the effect of \(g\) on \(v_F\), is \(m\), the slope,
- \(g\), the independent variable, is \(x\),
- \(v_0\), the final volume when \(g = 0\), is \(b\).
Note that the units for each term are equal:
- \(v_F\) has units mL
- \(bg\) has units \(\frac{mL}{\%} \cdot \% =\) mL
- \(v_0\) has units mL Finally, lets consider a third experiment where the independent variable is discrete.

EXPERIMENT 3

The Glycemic Index Foundation of South Africa designates Beverages as “Low”, “Intermediate” or “High” Glycemic Index (GI) values. In this experiment, 3 beverages from each group: Low, Intermediate, and High, are used in place of the glucose solution in the osmometer, and for each of these 9 experiments the solution in the beaker is distilled water.

In Experiment 3, the independent variable is discrete because each beverage belongs to a group. The hypothesis, prediction, and equation for Experiment 3 might be:

Hypothesis: When a solution is more hypertonic, more water will diffuse into it from a hypotonic solution.

Prediction: The volume of water in the osmometer will be largest for the three beverages from the “High” GI group, and smallest for the 3 beverages from the “Low” GI group.

Equation:

\(μ_H > μ_I > μ_L\),

where \(μ_H\) is the mean volume of liquid in the osmometer for the three “High” GI beverages (mL), \(μ_I\) is the mean volume of liquid in the osmometer for the three “Intermediate” GI beverages (mL), and \(μ_L\) is the mean volume of liquid in the osmometer for the three “Low” GI beverages (mL).

The equation for Experiment 3 is similar to the equation for Experiment 1. For Experiment 3, there were three beverages in each group, so we defined a measure of the central tendency, in this case the mean, so that we could compare between the groups.

C.2 Hypothesis testing

C.2.1 Null hypotheses

Null means nothing or zero and null hypotheses usually describe outcomes where the independent variables have no effect. Null hypotheses are usually paired with alternative hypotheses whereby the independent variable does have an effect. As such, H3 is a potential null hypothesis, since this hypothesis describes the concentration gradient having no effect on the direction of difussion, and could be paired with the alternative hypothesis, H2. When written as an equation, frequently null hypotheses are expressed as a given parameter set equal to zero.

Null Hypothesis: Water diffuses equally in both directions along the osmotic gradient.

Alternative Hypothesis. Water diffuses from a hypotonic solution into a hypertonic solution.

Prediction under the null hypothesiss: The volume of water in the osmometer will not change.

Equation under the null hypothesis:

\(v_F = v_0\),

where \(v_F\) is the final volume of the glucose-dye solution in the osmometer (units: mL), and
\(v_0\) is the initial volume of glucose-dye solution in the osmometer (units: mL).

Historically, many statistical analyses have been performed in the null hypothesis testing framework. More recently, information theoretic approaches that assess the relative support for each of multiple working hypotheses have gained popularity (Burnham and Anderson 2003); that approach may be introduced in upper year courses.

C.3 Further reading

Cite sections of Whitlock and Schluter here.

C.4 Status

We have substantial edits from a colleague to revise this chapter. Would also like to consult with Stats.

Abedin, Jaynal, and Hrishi V Mittal. 2014. R Graphs Cookbook Second Edition. Packt Publishing Ltd.

Barraquand, Frédéric, Thomas HG Ezard, Peter S Jørgensen, Naupaka Zimmerman, Scott Chamberlain, Roberto Salguero-Gomez, Timothy J Curran, and Timothée Poisot. 2014. “Lack of Quantitative Training Among Early-Career Ecologists: A Survey of the Problem and Potential Solutions.” PeerJ 2: e285.

Hampton, Stephanie E, Sean S Anderson, Sarah C Bagby, Corinna Gries, Xueying Han, Edmund M Hart, Matthew B Jones, et al. 2015. “The Tao of Open Science for Ecology.” Ecosphere 6 (7): 1–13.

Hart, Edmund M, Pauline Barmby, David LeBauer, François Michonneau, Sarah Mount, Patrick Mulrooney, Timothée Poisot, Kara H Woo, Naupaka B Zimmerman, and Jeffrey W Hollister. 2016. “Ten Simple Rules for Digital Data Storage.” PLoS Computational Biology 12 (10).

Ismay, Chester, and Albert Y Kim. 2019. Statistical Inference via Data Science: A ModernDive into r and the Tidyverse. CRC Press.

Lai, Jiangshan, Christopher J Lortie, Robert A Muenchen, Jian Yang, and Keping Ma. 2019. “Evaluating the Popularity of r in Ecology.” Ecosphere 10 (1): e02567.

Murrell, P. 2018. R Graphics. Chapman & Hall/CRC the r Series. CRC Press. https://books.google.ca/books?id=ol3RBQAAQBAJ.

White, Ethan P, Elita Baldridge, Zachary T Brym, Kenneth J Locey, Daniel J McGlinn, and Sarah R Supp. 2013. “Nine Simple Ways to Make It Easier to (Re) Use Your Data.” Ideas in Ecology and Evolution 6 (2).