Beyond Chi-square: Alternative Tests for Categorical Data

Exact Tests: Precision for Small Samples

Source: McDonald JH “Handbook of Biological Statistics” McDonald (2014)

When working with small sample sizes, exact tests provide more reliable results than chi-square tests by calculating probabilities directly. Let’s see how this works with a real example.

Case Study: Cat Paw Preference

source : https://rcompanion.org/rcompanion/b_01.html

In this experiment, we observed our cat Gus batting at a dangling ribbon 10 times, recording which paw he used. Gus used his right paw 8 times and his left paw 2 times. Let’s analyze this data:

# Calculate probability of exactly 2 left paws in 10 trials
dbinom(2, 10, 0.5)

[1] 0.04394531

# One-sided test: Does Gus use left paw less than expected?
binom.test(2, 10, 0.5,
           alternative="less",       
           conf.level=0.95)


    Exact binomial test

data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.05469
alternative hypothesis: true probability of success is less than 0.5
95 percent confidence interval:
 0.0000000 0.5069013
sample estimates:
probability of success 
                   0.2

# Two-sided test: Any deviation from 50/50?
binom.test(2, 10, 0.5,
           alternative="two.sided",  
           conf.level=0.95)


    Exact binomial test

data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.02521073 0.55609546
sample estimates:
probability of success 
                   0.2

The p-value of 0.0547 suggests moderate evidence that Gus might prefer his right paw, though we’d want more trials for a stronger conclusion.

Fisher’s Exact Test: Small Sample Independence

Fisher’s exact test excels when examining relationships between variables in small samples. Here’s an example looking at treatment outcomes in a small clinical trial:

# Create a small dataset
treatment <- matrix(c(2, 8, 7, 3), nrow=2,
                   dimnames=list(c("Improved", "Not_Improved"),
                               c("Treatment", "Control")))

# View the data
treatment

             Treatment Control
Improved             2       7
Not_Improved         8       3

# Perform Fisher's exact test
fisher.test(treatment)


    Fisher's Exact Test for Count Data

data:  treatment
p-value = 0.06978
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.007870555 1.133635839
sample estimates:
odds ratio 
 0.1226533

The small p-value suggests the treatment might be effective, even with our limited sample size. Fisher’s test is particularly valuable here because several cells have values less than 5, which would make a chi-square test less reliable.

For further reading, see:

McDonald JH “Handbook of Biological Statistics” McDonald (2014)

G-test: An Alternative Approach

The G-test offers another way to analyze categorical data. Here’s an example comparing educational outcomes across different study methods:

# Load required package
if (!require("DescTools")) install.packages("DescTools")
library(DescTools)

# Create sample data
study_methods <- matrix(c(25, 15, 10,
                         12, 18, 20), nrow=2, byrow=TRUE,
                       dimnames=list(c("Pass", "Fail"),
                                   c("Method_A", "Method_B", "Method_C")))

# View the data
study_methods

     Method_A Method_B Method_C
Pass       25       15       10
Fail       12       18       20

# Perform G-test
GTest(study_methods)


    Log likelihood ratio (G-test) test of independence without correction

data:  study_methods
G = 8.3376, X-squared df = 2, p-value = 0.01547

The G-test results closely parallel what we’d get from a chi-square test, but some statisticians prefer it for its theoretical properties. For most practical purposes, chi-square and G-tests give similar results.

Source: McDonald JH “Handbook of Biological Statistics” McDonald (2014)

In the interest of time, we will not review it here, but refer to these resources for further reading.

G-test of goodness of fit - explained and R examples
G-test of independence - explained and R examples

McDonald, John H. 2014. Handbook of Biological Statistics. 3rd ed. Baltimore, MD: Sparky House Publishing.