Beyond Chi-square: Alternative Tests for Categorical Data

Exact Tests: Precision for Small Samples

Source: McDonald JH “Handbook of Biological Statistics” McDonald (2014)

When working with small sample sizes, exact tests provide more reliable results than chi-square tests by calculating probabilities directly. Let’s see how this works with a real example.

Case Study: Cat Paw Preference

source : https://rcompanion.org/rcompanion/b_01.html

In this experiment, we observed our cat Gus batting at a dangling ribbon 10 times, recording which paw he used. Gus used his right paw 8 times and his left paw 2 times. Let’s analyze this data:

# Calculate probability of exactly 2 left paws in 10 trials
dbinom(2, 10, 0.5)            
[1] 0.04394531
# One-sided test: Does Gus use left paw less than expected?
binom.test(2, 10, 0.5,
           alternative="less",       
           conf.level=0.95)

    Exact binomial test

data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.05469
alternative hypothesis: true probability of success is less than 0.5
95 percent confidence interval:
 0.0000000 0.5069013
sample estimates:
probability of success 
                   0.2 
# Two-sided test: Any deviation from 50/50?
binom.test(2, 10, 0.5,
           alternative="two.sided",  
           conf.level=0.95)

    Exact binomial test

data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.02521073 0.55609546
sample estimates:
probability of success 
                   0.2 

The p-value of 0.0547 suggests moderate evidence that Gus might prefer his right paw, though we’d want more trials for a stronger conclusion.

Fisher’s Exact Test: Small Sample Independence

Fisher’s exact test excels when examining relationships between variables in small samples. Here’s an example looking at treatment outcomes in a small clinical trial:

# Create a small dataset
treatment <- matrix(c(2, 8, 7, 3), nrow=2,
                   dimnames=list(c("Improved", "Not_Improved"),
                               c("Treatment", "Control")))

# View the data
treatment
             Treatment Control
Improved             2       7
Not_Improved         8       3
# Perform Fisher's exact test
fisher.test(treatment)

    Fisher's Exact Test for Count Data

data:  treatment
p-value = 0.06978
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.007870555 1.133635839
sample estimates:
odds ratio 
 0.1226533 

The small p-value suggests the treatment might be effective, even with our limited sample size. Fisher’s test is particularly valuable here because several cells have values less than 5, which would make a chi-square test less reliable.

For further reading, see:

McDonald JH “Handbook of Biological Statistics” McDonald (2014)

G-test: An Alternative Approach

The G-test offers another way to analyze categorical data. Here’s an example comparing educational outcomes across different study methods:

# Load required package
if (!require("DescTools")) install.packages("DescTools")
library(DescTools)

# Create sample data
study_methods <- matrix(c(25, 15, 10,
                         12, 18, 20), nrow=2, byrow=TRUE,
                       dimnames=list(c("Pass", "Fail"),
                                   c("Method_A", "Method_B", "Method_C")))

# View the data
study_methods
     Method_A Method_B Method_C
Pass       25       15       10
Fail       12       18       20
# Perform G-test
GTest(study_methods)

    Log likelihood ratio (G-test) test of independence without correction

data:  study_methods
G = 8.3376, X-squared df = 2, p-value = 0.01547

The G-test results closely parallel what we’d get from a chi-square test, but some statisticians prefer it for its theoretical properties. For most practical purposes, chi-square and G-tests give similar results.

Source: McDonald JH “Handbook of Biological Statistics” McDonald (2014)

In the interest of time, we will not review it here, but refer to these resources for further reading.

McDonald, John H. 2014. Handbook of Biological Statistics. 3rd ed. Baltimore, MD: Sparky House Publishing.