Background: Ocean acidification is a result of increased carbon dioxide (CO₂) in the atmosphere. As more CO₂ is absorbed by the ocean, it reacts with seawater to produce carbonic acid, which subsequently lowers the ocean’s pH. This can affect marine life, especially shellfish and corals, which rely on carbonate ions to build their shells and skeletons.
Scientific Hypothesis: Ocean acidification significantly affects the growth of shellfish.
Data Collection: Researchers collect a sample of shellfish and divide them into two groups. One group (Control) is kept in seawater with a pH close to current levels. The other group (Treated) is kept in seawater with a lower pH, simulating future predicted levels of ocean acidification.
After a specified period, the growth (e.g., increase in shell length) of the shellfish in each group is measured.
Sample Data:
Control Group (current pH level): Growth in mm: [2.1, 2.3, 2.2, 2.4, 2.5, 2.3, 2.4, 2.2, 2.3, 2.1]
Treated Group (lower pH level): Growth in mm: [1.8, 1.9, 1.7, 1.9, 1.8, 1.7, 1.9, 1.8, 1.7, 1.8]
The researchers want to know if the difference in growth between the two groups is statistically significant.
Performing a t-test by hand involves several steps. Below are the steps to perform an independent two-sample t-test by hand along with the corresponding R code:
# remember tabular data have one observation per row and one variable per column.# these data are NOT in tabular form. # 1. data# 1. Calculate the sample meansmean_control<-mean(dat$control)mean_treated<-mean(dat$treated)# 2. Calculate the sample variancesvar_control<-var(dat$control)var_treated<-var(dat$treated)# 3. Calculate the pooled variancen1<-length(dat$control)n2<-length(dat$treated)pooled_var<-((n1-1)*var_control+(n2-1)*var_treated)/(n1+n2-2)# 4. Calculate the t-statistict_statistic<-(mean_control-mean_treated)/sqrt(pooled_var*(1/n1+1/n2))# 5. Determine the degrees of freedomdf<-n1+n2-2# 6. Compare the calculated t-statistic to the critical t-valuealpha<-0.05critical_value<-qt(1-alpha/2, df)# For a two-tailed testsignificant<-abs(t_statistic)>critical_value# Calculate the p-valueif(t_statistic>0){p_value<-2*(1-pt(t_statistic, df))}else{p_value<-2*pt(t_statistic, df)}# Resultsresults<-list(mean_control =mean_control, mean_treated =mean_treated, t_statistic =t_statistic, df =df, pvalue =p_value, significant =significant)results
if you want to print any small number as a number instead of in scientific notation, run this. scipen is the number of significant digits to print in scientific notation. Setting it to a large number will prevent R from printing numbers in scientific notation.
The p value is 0.00000001, which is much less than 0.05, so we can reject the null hypothesis that the two groups have the same growth rate.
Lets plot!
# Required librarylibrary(tidyverse)library(ggplot2)# Prepare data for plotting# here we are making it long by pivoting longer, so that we have a new column indicating our grouping variable and another indicating the measured variable (Growth)datlong<-dat%>%pivot_longer(cols =c(control, treated), names_to ="Group", values_to ="Growth")# Create the plotggplot(datlong, aes(x=Group, y=Growth))+geom_boxplot(aes(fill=Group))+geom_jitter(width=0.2, aes(color=Group))+labs(title="Effect of Ocean Acidification on Shellfish Growth", y="Growth (mm)")+theme_minimal()+theme(legend.position ="NONE")
p<-ggplot(datlong, aes(x=Group, y=Growth))+geom_boxplot(aes(fill=Group))+geom_jitter(width=0.2, aes(color=Group))+labs(title="Effect of Ocean Acidification on Shellfish Growth", y="Growth (mm)")+theme_minimal()+theme(legend.position ="NONE")p
# If p-value is less than 0.05, add a significance indicatorif(p_value<0.05){p<-p+geom_segment(aes(x =1, xend =2, y =max(Growth)+0.1, yend =max(Growth)+0.1))+geom_text(aes(x =1.5, y =max(Growth)+0.15, label ="*"), size =6)}p
now, that was a lot of work to get that significance though, though wasn’t it?
Two Sample t-test
data: Growth by Group
t = 9.798, df = 18, p-value = 0.00000001222
alternative hypothesis: true difference in means between group control and group treated is not equal to 0
95 percent confidence interval:
0.3770763 0.5829237
sample estimates:
mean in group control mean in group treated
2.28 1.80