Conducting T-Tests in R
lets expand and learn more about t.test()
…
Importing and inspecting data
Use read.csv()
or other relevant functions to bring your data into R. Always inspect the dataset first.
Make sure your data are in the correct format for the t-test you want to conduct.
If your data are in tabular format (e.g., one row per observation): Which columns correspond to your grouping variable? Which columns contain the values you want to test?
If your data are not in tabular format, you may need to reshape them using functions like pivot_longer()
or pivot_wider()
from the tidyverse
package.
You may also be able to simply index those columns directly in your t.test call, like we did in the previous section.
Conducting different types of t-tests
a one-sample t-test
To test if the mean of a group differs from a known value:
# Assuming 'value' is the known value you're comparing against
t.test(
column_name ~ group_column,
data = data,
mu = value,
alternative = "two.sided" / "less" / "greater"
)
a two-sample t-test
For comparing the means of two independent groups:
# Assuming 'group_column' is a binary factor indicating the two groups you want to compare
t.test(
column_name ~ group_column,
data = data,
var.equal = TRUE / FALSE,
alternative = "two.sided" / "less" / "greater"
)
a paired t-test
For dependent or paired samples:
# 'before_column' and 'after_column' should be in long format with a grouping variable indicating the 'before' and 'after' measurements
t.test(
value_column ~ time_column,
data = data,
paired = TRUE,
alternative = "two.sided" / "less" / "greater"
)
The tilde notation
In R, the tilde (~
) is often used to denote a formula. The t.test()
function can accept formulas as an argument when the test is applied to data in a data frame.
When you see a formula like:
response ~ group
It typically means that you’re specifying a relationship where “response” is modeled or tested as a function of “group”. In simpler terms, you’re looking at how “response” varies by “group”.
In the context of t.test()
, if you use a formula like the one above, you’re indicating that you want to perform a t-test to compare the means of the “response” variable across different levels of the “group” variable.
Other arguments to t.test
column_name: This is the column of your data that contains the values you wish to test.
group_column: A binary factor column that distinguishes between the group you want to test and other potential groups in the dataset.
data: The name of the dataframe that contains the columns mentioned.
mu: This is the known or hypothesized value of the mean you’re comparing your sample mean against.
-
alternative: Specifies the kind of test to be conducted.
“two.sided” (default): Tests if the sample mean is different from the hypothesized mean.
“less”: Tests if the sample mean is less than the hypothesized mean.
“greater”: Tests if the sample mean is greater than the hypothesized mean.
-
var.equal: A logical variable indicating whether to treat the two variances as being equal. If
TRUE
, then a pooled variance is used to estimate the variance, otherwise separate variances are used.TRUE
: Assumes equal variance (classic two-sample t-test).FALSE
: Does not assume equal variance (Welch’s t-test).