Visualizing Data

Data visualization is an essential component of statistical analysis, enabling researchers to quickly grasp complex information, identify patterns, and communicate findings effectively. This section explores key types of plots and their applications in various research contexts.

Key Elements in Plots

Every plot you make in this lab (and forever after that) must have the following elements:

Axis Labels with Units Every axis should clearly indicate the measured variable and include units when applicable.
Axis Values Ensure values are legible and appropriately spaced.
Data Source/Caption/Title Include a description of the plot, either as a caption (for publication) or as a title (for presentations).

In addition, you may also need to add additional elements based on the type of plot you are creating:

Legend Essential for plots with multiple series or categories. Position it to avoid obscuring data.
Color Scheme Use consistent, distinguishable, and colorblind-friendly colors across plots.
Grid Lines Can aid in value comparison and improve chart readability.
Error Bars Visualize data intervals using error bars. You must ALWAYS clarify what error bars represent (SD, SEM, or confidence interval) in the caption or title!
- Standard Deviation (SD) Represents the spread of the data.
- Standard Error of the Mean (SEM) Indicates how far the sample mean is from the true population mean.
- Confidence Intervals (CI) Provides an interval where the true value is expected to lie with a certain level of confidence.

show R code

par(mar = c(4, 4, 1, 1))
conditions <- c("Low - 15°C", "Med - 20°C", "High - 25°C")
mean_growth_rate <- c(2, 3, 2.5)  # average growth rates
se_growth_rate <- c(0.1, 0.2, 0.15)  # standard errors
bar_centers <-
  barplot(
    mean_growth_rate,
    names.arg = conditions,
    ylim = c(0, 4),
    ylab = "Growth Rate (cm/day)",
    xlab = "Temperature",
    border = "lightblue",
    col = "lightblue",
    las = 1
  )
arrows(
  bar_centers,
  mean_growth_rate + se_growth_rate,
  bar_centers,
  mean_growth_rate - se_growth_rate,
  angle = 90,
  code = 3,
  length = 0.1,
  lwd = 2,
  col = "red"
)

Figure 1: Average growth rate of organism under different temperatures ± SEM.

Types of Figures and Their Uses

Scatterplot

Scatterplots are used to display relationships between two quantitative variables. They are particularly useful in ecological studies, such as examining the relationship between water temperature and phytoplankton concentration.

show R code

par(mar = c(4, 5, 1, 1))
water_temp <- c(15, 20, 18, 22, 17, 19, 16, 23)
phytoplankton <- c(50, 80, 60, 85, 55, 75, 58, 90)
plot(water_temp,
     phytoplankton,
     xlab = "Water Temperature (°C)",
     ylab = "Phytoplankton conc \n(individuals/L)")

Figure 2: Water Temperature vs. Phytoplankton Concentration.

Histogram

Histograms display the distribution of a single quantitative variable. They are commonly used in population studies, such as analyzing the distribution of fish lengths in a sampled area.

show R code

par(mar = c(4, 5, 1, 1))
fish_lengths <- rnorm(50) + 5
hist(fish_lengths,
     xlab = "Fish Length (cm)",
     main = " ",
     ylab = "Frequency")

Barplot

Barplots compare quantities across different categories. They are often used in ecological surveys, such as comparing species abundance in different habitats.

show R code

par(mar = c(4, 5, 1, 1))
species <- c("Sharks", "Turtles", "Dolphins")
counts <- c(10, 7, 5)
barplot(counts,
        names.arg = species,
        ylab = "Counts",
        xlab = "Species")

Figure 4: Number of Different Marine Species.

Boxplot

Boxplots display the distribution and spread of a quantitative variable across different categories or groups. They are particularly useful in comparative studies, such as examining salinity levels in different marine zones.

show R code

par(mar = c(4, 5, 1, 1))
coastal_salinity <- c(35, 36, 34, 35.5, 35.2)
open_ocean_salinity <- c(37, 38, 37.5, 37.8, 37.2)
deep_sea_salinity <- c(34.5, 34, 33.8, 34.2, 33.9)
boxplot(
  coastal_salinity,
  open_ocean_salinity,
  deep_sea_salinity,
  names = c("Coastal", "Open Ocean", "Deep Sea"),
  ylab = "Salinity (ppt)",
  xlab = "Zone"
)

Figure 5: Salinity Levels in Different Marine Zones.

Line Plot

Line plots are used to display trends over time or continuous data. They are commonly employed in climate studies, such as tracking sea surface temperature changes over time.

show R code

par(mar = c(4, 5, 1, 1))

months <- 1:12
sea_surface_temp <-
  c(20, 19, 18, 18, 19, 21, 23, 24, 24, 23, 22, 21)
plot(
  months,
  sea_surface_temp,
  type = "l",
  xlab = "Month",
  ylab = "Temperature (°C)",
  xaxt = "n"
)
axis(
  1,
  at = 1:12,
  labels = c(
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep",
    "Oct",
    "Nov",
    "Dec"
  )
)

Figure 6: Monthly Sea Surface Temperature.

Advanced Visualization Techniques

Multi-panel Plots

Multi-panel plots, also known as faceted plots or small multiples, allow for the comparison of multiple related datasets or variables simultaneously. They are particularly useful in complex studies where relationships between multiple variables need to be examined.

show R code

par(mfrow=c(2,2), mar=c(4,4,2,1))
data(iris)
plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species, main="Sepal Length vs Width")
plot(iris$Petal.Length, iris$Petal.Width, col=iris$Species, main="Petal Length vs Width")
plot(iris$Sepal.Length, iris$Petal.Length, col=iris$Species, main="Sepal Length vs Petal Length")
plot(iris$Sepal.Width, iris$Petal.Width, col=iris$Species, main="Sepal Width vs Petal Width")

Figure 7: Multi-panel plot showing relationships between sepal length, sepal width, petal length, and petal width in iris dataset.

Heatmaps

Heatmaps are useful for visualizing complex, multi-dimensional data. They are commonly used in genomics, climate science, and social network analysis to reveal patterns and clusters in large datasets.

show R code

library(corrplot)
cor_matrix <- cor(iris[,1:4])
corrplot(cor_matrix, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45, 
         title="Correlation Heatmap of Iris Features")

Figure 8: Heatmap of correlation matrix for iris dataset.

Dimensionality Reduction Techniques

When dealing with high-dimensional data, techniques like Principal Component Analysis (PCA) or t-SNE can be used to create 2D or 3D visualizations that capture the essence of complex datasets.

show R code

pca_result <- prcomp(iris[,1:4], scale. = TRUE)
plot(pca_result$x[,1:2], col=iris$Species, pch=19,
     xlab="First Principal Component", ylab="Second Principal Component",
     main="PCA of Iris Dataset")
legend("topright", legend=levels(iris$Species), col=1:3, pch=19)

Interactive Plots

While static plots are suitable for publications, interactive plots can be powerful tools for data exploration and presentation in digital formats (such as this lab manual!). Libraries such as plotly in R or D3.js in JavaScript allow for the creation of interactive visualizations.

Choosing the Right Visualization

Selecting the appropriate visualization type is crucial for effective data communication. Consider the following factors:

Data Type: Categorical, continuous, time-series, etc.
Number of Variables: Univariate, bivariate, or multivariate data.
Relationship of Interest: Comparison, composition, distribution, or trend.
Audience: Technical expertise of the intended audience.
Medium: Print publication, digital presentation, or interactive dashboard.

Data Visualization Ethics

Ethical considerations in data visualization are crucial:

Truthful Representation: Ensure that the visualization accurately represents the underlying data without distortion.
Context Provision: Provide sufficient context to prevent misinterpretation.
Accessibility: Consider color-blind friendly palettes and other accessibility features.
Uncertainty Communication: Clearly represent uncertainties and limitations in the data.

Visualizing Data

Key Elements in Plots

Types of Figures and Their Uses

Scatterplot

Histogram

Barplot

Boxplot

Line Plot

Advanced Visualization Techniques

Multi-panel Plots

Heatmaps

Dimensionality Reduction Techniques

Interactive Plots

Choosing the Right Visualization

Data Visualization Ethics

Further Reading

Test Your Understanding

Quiz: Data Visualization