Introduction to Meta-Analysis

Matthew Grainger

What is Meta-Analysis?

  • Meta-analysis is a statistical technique to combine results from multiple studies to get a clearer, more reliable picture

  • Say we want to know whether grazing reduces plant biodiversity. Studies disagree—some show decreases, some no effect, some even increases. Meta-analysis helps us quantify the overall trend

Why Meta-Analysis?

  • Handles variation across studies

  • Can improve power and precision

  • Helps move beyond “vote-counting” (i.e., tallying +/– effects)

What is Vote-counting

Core concepts

Effect size

  • Effect size“A standardised way to compare results (e.g., log response ratio).”

Effect sizes are key to synthesising study results in meta-analysis, as they provide a standardised way of comparing findings across studies. Common effect sizes include the standardised mean difference (Hedges’ g in ecology), correlation coefficient, and log-transformed response ratio (lnRR).

Effect size - SMD

We have data from six fictional studies that examined species abundance before and after restoration efforts. The dataset includes: - Sample sizes before and after restoration (n1 and n2). - Mean species abundance before and after (m1 and m2). - Standard deviations of abundance before and after (sd1 and sd2).

Study N (Before) N (After) Mean (B) Mean (A) SD (B) SD (A)
Smith_2015 30 28 5.2 3.9 1.1 0.9
Johnson_2017 50 48 6.1 5.4 1.3 1.1
Lee_2018 45 44 5.9 5.3 1.0 0.8
Gomez_2016 32 30 4.8 4.2 1.2 1.0
Patel_2019 40 39 5.5 4.9 1.4 1.2
Chen_2020 35 33 5.1 4.5 1.2 1.1

The standardised mean difference (SMD) between e.g. restored and control sites is calculated as follows:

\[\text{SMD} = \frac{\text{mean}_{\text{restored}} - \text{mean}_{\text{control}}}{\text{pooled standard deviation}}\] Where the pooled standard deviation is calculated as:

\[2SD_{\text{pooled}} = \sqrt{\frac{(n_{\text{restored}} - 1) \cdot SD_{\text{restored}}^2 + (n_{\text{control}} - 1) \cdot SD_{\text{control}}^2}{n_{\text{restored}} + n_{\text{control}} - 2}}\]

Effect size LnRR

  • The log response ratio (LnRR) is another common effect size used in meta-analysis, particularly in ecology. It is calculated as:

\[\text{LnRR} = \ln\left(\frac{\text{mean}_{\text{treatment}}}{\text{mean}_{\text{control}}}\right)\]

  • The log response ratio is useful because it can handle ratios that are less than 1 (i.e., negative effects) and is symmetric around zero.

Imagine a study comparing biodiversity in grazed vs. ungrazed grasslands:

Group Mean Biodiversity Standard Deviation Sample Size
Grazed 15.2 2.5 30
Ungrazed 18.6 3.1 28

We calculate the effect size as: \[\text{-0.202} = \ln\left(\frac{\text{15.2}}{\text{18.6}}\right)\]

Interpretation

  • LnRR > 0: Treatment increases the outcome

  • LnRR < 0: Treatment decreases the outcome

  • LnRR ≈ 0: No effect

  • In our example, the negative LnRR suggests grazing reduces biodiversity in this study

Variance

  • Variance/weighting“More precise studies get more influence in the pooled estimate.”
  • In meta-analysis, not all studies are treated equally. Some studies provide more precise estimates than others, and variance (uncertainty) helps determine their weight in the overall analysis.

  • Each study reports an effect size (e.g., LnRR) and a measure of its uncertainty. The variance of an effect size is calculated as:

\[v_i = \frac{SD_1^2}{n_1 \cdot \text{Mean}_1^2} + \frac{SD_2^2}{n_2 \cdot \text{Mean}_2^2}\]

How Variance Affects Study Weighting

  • Studies with smaller variance (precise estimates, large sample size, low SD) get higher weight.

  • Studies with higher variance (uncertain estimates, small sample size, high SD) get lower weight.

  • In a random-effects meta-analysis, we also account for between-study heterogeneity (τ²), which represents true variation beyond sampling error.

Study LnRR Variance (vi) Weight in Meta-Analysis
Study A (Large n, Low SD) -0.20 0.01 High ✅
Study B (Small n, High SD) -0.20 0.10 Low ❌

Even though both studies report the same effect size (-0.20), Study A gets more weight because it is more precise.

Fixed vs. random effects

  • In a fixed-effect model, we assume that all studies estimate the same true effect.
  • In contrast, a random-effects model assumes that each study estimates its own effect size, accounting for between-study variability.

Fixed effect

Random effects

Multi-level models

  • Accounts for hierarchical data structures
    • e.g. multiple sites for a single study or multiple measurements from one site
    • e.g. multiple studies on the same species or taxonomic groups
  • Normally the method we use in ecology

Heterogeneity - the fun part!

  • Heterogeneity is one of the most critical aspects of meta-analysis—it tells us whether the effect sizes across studies are similar or highly variable.

  • Recognising and handling heterogeneity is essential for drawing reliable conclusions.

  • The expectation of heterogeneity is the main difference between medicine and ecology meta-analysis.

What is Heterogeneity?

  • Heterogeneity refers to the variation in effect sizes across studies. If all studies estimate the same true effect, the only differences between them should be due to sampling error. However:
    • Biological & ecological differences
    • Methodological differences
    • Publication effects
  • When heterogeneity is high, a simple average of effect sizes may not be meaningful

Understanding the outputs from a meta-analysis


Random-Effects Model (k = 10; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0128 (SE = 0.0071)
tau (square root of estimated tau^2 value):      0.1131
I^2 (total heterogeneity / total variability):   85.89%
H^2 (total variability / sampling variability):  7.09

Test for Heterogeneity:
Q(df = 9) = 58.1438, p-val < .0001

Model Results:

estimate      se     zval    pval    ci.lb   ci.ub    
 -0.0593  0.0387  -1.5300  0.1260  -0.1352  0.0167    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • Random-Effects Model (k = 10; tau² estimator: REML)

    • You are analysing 10 effect sizes (k = 10) using a random-effects model, meaning you assume each study is estimating a different but related true effect size
    • The REML (Restricted Maximum Likelihood) method is used to estimate between-study variance.

Between-Study Variance

tau² = 0.0128 (SE = 0.0071)

This is the estimated amount of heterogeneity (real variation) in true effect sizes between studies.

Interpretation:

The larger this value, the more variation exists among study results beyond random sampling error. It’s in squared units of your effect size

Standard Deviation of True Effects

tau = 0.1131

This is simply the square root of tau². It gives the standard deviation of true effect sizes across studies.

Interpretation: On average, true effect sizes vary by ~0.11 around the average effect size.

Proportion of Variability Due to Heterogeneity

I² = 85.89%

This tells you the percentage of total variability in effect sizes that is due to real differences between studies, rather than random chance.

Interpretation: A very high I² (86%) means most of the variation is likely due to actual differences in study context (e.g., species, location, design) — not just sampling error.

Ratio of Total to Sampling Variability

H² = 7.09 This tells you how much bigger the total variability is compared to what you’d expect from sampling error alone.

Interpretation: The observed variability in effect sizes is about 7 times larger than what we’d expect if all studies were estimating the same true effect size.

Test for Heterogeneity

Q(df = 9) = 58.14, p < .0001

This is Cochran’s Q test, which tests if observed variability is more than expected by chance.

Interpretation: Since the p-value is very small (< 0.0001), we reject the null hypothesis of homogeneity.There is significant heterogeneity — studies are not all estimating the same effect.

⚠️ Caveat: The Q test can be overly sensitive with many studies, and underpowered with few.

Overall Effect Size Estimate

Estimate SE Z-value p-value CI (95%)
-0.0593 0.0387 -1.53 0.1260 -0.1352 to 0.0167

Estimate: The average effect size is -0.0593.

CI includes 0, and p = 0.126, so the effect is not statistically significant at the 0.05 level.

This suggests there’s no consistent directional effect, but the high heterogeneity indicates the effects may differ across contexts.

In this meta-analysis, we found high heterogeneity (I² = 86%) among studies. This suggests that true effect sizes vary meaningfully across ecological contexts—perhaps due to different species, ecosystems, or study designs.The estimated average effect was small and not statistically significant, but given the variation across studies, this average may not be meaningful on its own.

🌳 From Forest Plots to Orchard Plots 🍊: Visualising Meta-Analysis Results

🌲 Forest Plots: The Classic View

  • Show individual study effect sizes with confidence intervals

  • Easy to interpret, especially for significance and precision

  • Great for small to moderate numbers of studies

BUT… can get cluttered and hard to read with many studies or subgroup comparisons

🍊 Orchard Plots: A Modern, Flexible Alternative

  • Summarise:
    • Study effect sizes (optionally grouped)
    • The overall model estimate (± CI)
    • Weighting of studies (as point size)
  • Can show:
    • Meta-regression results
    • Multi-level models
    • Moderator effects

Regression Coefficients as Effect Sizes

  • Sometimes studies report regression coefficients instead of group comparisons.
  • These can be used directly as effect sizes if:
    • The predictor is consistent across studies (e.g., standardised)
    • The outcome is continuous
  • Often extracted as raw slopes, standardised \(\beta\), or log-odds

Extracting Regression Coefficients

# Example: effect of temperature on bird abundance 

study <- data.frame(study_id = 1:4, yi = c(0.12, 0.20, 0.05, 0.18), #regression slopes 
                     sei = c(0.03, 0.04, 0.02, 0.05) )

rma(yi, sei^2, data = study)

Random-Effects Model (k = 4; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0038 (SE = 0.0041)
tau (square root of estimated tau^2 value):      0.0620
I^2 (total heterogeneity / total variability):   78.82%
H^2 (total variability / sampling variability):  4.72

Test for Heterogeneity:
Q(df = 3) = 15.4897, p-val = 0.0014

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub      
  0.1297  0.0356  3.6405  0.0003  0.0599  0.1996  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • But must ensure effect direction is interpretable and consistent

Standardising Predictors or Outcomes

  • If studies use different measurement scales:

    • Convert to standardised \(\beta\) if possible

    • Or standardise both predictor and outcome before meta-analysis

    • Note: This can reduce heterogeneity and improve comparability

Standardising Regression Coefficients (\(\beta\))

  • Standardisation makes coefficients comparable across studies
  • Formula (for a simple linear regression):

\[ \beta_{\text{standardised}} = \beta \cdot \frac{\text{SD}_X}{\text{SD}_Y} \]

Where:

  • (\(\beta\)): unstandardised slope (from paper)

  • (\(\text{SD}\_X\)): SD of predictor (e.g. temperature)

  • (\(\text{SD}\_Y\)): SD of outcome (e.g. abundance)

Example: Temperature Effect on Insect Abundance

Study reports: \(\beta\) = 2.5 (per \(^\circ\)C), temp SD = 5, abundance SD = 20

beta_raw <- 2.5 
sd_temp <- 5 
sd_abund <- 20
(beta_std <- beta_raw * (sd_temp / sd_abund) )
[1] 0.625
#"A one SD increase in temperature is associated with a 0.625 SD increase in abundance."

studies <- data.frame( beta = c(2.5, 1.2, 3.0, 0.8), sd_temp = c(5, 4, 6, 3), sd_abund = c(20, 10, 15, 8) )
studies$beta_std <- with(studies, beta * (sd_temp / sd_abund)) 

You can now use beta_std as your yi values in rma() or rma.mv().

Note

Use SDs from within each study, not pooled

If variances (SE/CI) are available for raw \(\beta\), you can also transform the SEs: \[SE std = SE raw * \frac{\text{SD}_X}{\text{SD}_Y}\]

  • Check study assumptions: linearity, normality, scale units

Funnel Plots & Multi-Level Meta-Analysis

  • Traditional funnel plots assume independent effect sizes

  • But in multilevel models, effects are nested (e.g., multiple per study)

  • This violates the assumption - misleading asymmetry or overdispersion

  • Visual inspection is unreliable

  • Needs a method that accounts for dependency

Nakagawa et al.’s Solution (2022)

  • Propose a regression-based test using residuals from multilevel model

  • Regress absolute residuals (or squared residuals) on precision

res <- rma.mv(yi, vi, random = ~ 1 | author, data = dat) 
dat$resid <- resid(res) 
dat$precision <- 1 / sqrt(dat$vi)  
summary(lm(abs(resid) ~ precision, data = dat))

Call:
lm(formula = abs(resid) ~ precision, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.3511 -0.1793 -0.0032  0.1066  0.8765 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.50069    0.08253   6.067  2.3e-07 ***
precision   -0.04108    0.01540  -2.668   0.0105 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2418 on 46 degrees of freedom
Multiple R-squared:  0.134, Adjusted R-squared:  0.1151 
F-statistic: 7.116 on 1 and 46 DF,  p-value: 0.01051
  • A significant slope indicates funnel asymmetry (small-study effects)

  • Avoids misleading inferences from clustered data

Advantages of Residual Regression

  • Compatible with multi-level models

  • Avoids false positives in funnel asymmetry

  • Reproducible and testable in R