Activity
adapted from: Sonora Meiling, Daniel Holstein
1. stolen cars
- Introduction and Hypotheses
What are the assumptions of the Chi-squared test?
Below is code to create a dataframe named “carz” which contains frequencies of cars stolen by season. Using this dataframe, test if there is a difference in frequencies of cars stolen among seasons.
What are the statistical null and alternative hypotheses?
- Null:
- Alternative:
- Creating the Dataframe (dont change this code)
- Examining the Data
Look at the data first. Create a mosaic plot using the function mosaicplot(). A mosaic plot is a graphical depiction of a contingency table. Do there seem to be differences in thefts among seasons?
- Checking Assumptions
Do the data meet the assumptions of a chi-squared test? Hand calculate the expected frequency of each season.
(112 + 334 + 372 + 327) / 4
But there’s an easier way… enter it here:
- Chi-Squared Test and Results
Use chisq.test() to perform a Chi-Squared test. What is the X-squared? p-value? Do you accept or reject the null hypothesis? What does that mean for the scientific hypothesis?
2. Genetics
- Introduction and Hypotheses
What if we don’t expect the frequencies among groups to be equal? For example, in simple Mendelian genetics crosses, we expect 75% of offspring to present the dominant trait, and 25% to present the recessive trait. In this example, we’ll test a population of 90 peas, where 80 expressed the dominant red petals, and 10 expressed the recessive pink petals. Run a chi-squared test to determine if this population follows Mendelian genetics.
What are the statistical null and alternative hypotheses?
- Null:
- Alternative:
- Creating the Contingency Table
- Examining the Data
Now let’s look at the data with a mosaic plot. Do they look approximately 75% and 25%?
- Checking Assumptions
Do the data meet the assumptions of a chi-square test?
- Chi-Squared Test and Results
The default probabilities of the chisq.test() function are equal among groups, so for this example, we have to set the expected values. Another way to run the code without calculating the decimal probabilities (just using a ratio). fill in X in the below code with the expected proportion.
X = # fill this in
chisq.test(genetics, p = c(X,1), rescale.p = T)
What is the X-squared?
p-value?
Do we accept or reject the null hypothesis? What does this mean practically?