Interpreting the Results Table
Results & InterpretationUnderstand what each column in the differential expression results table means and how to identify statistically significant, biologically meaningful genes.
When to Use
- Your analysis has completed and you are reviewing the results in the Results tab.
- You need to filter genes by significance or fold-change thresholds, or export a gene list for downstream analysis.
- You want to understand the meaning of each column before drawing biological conclusions.
Required Inputs
- A completed RNA-seq run with results available in the Results tab.
What to Expect
- baseMean: the average normalised count across all samples. Higher values indicate more abundant transcripts.
- log2FoldChange: the effect size on the log2 scale. A value of 1 means the gene is twice as highly expressed in the test group; -1 means half.
- lfcSE: the standard error of the log2 fold-change estimate. Smaller values indicate more precise estimates.
- pvalue: the raw p-value from the Wald test for the null hypothesis that the true fold change is zero.
- padj: the Benjamini-Hochberg adjusted p-value that controls the false discovery rate. This is the column you should use for significance calls.
Interpretation
- Use padj < 0.05 as the default significance threshold. Genes passing this cutoff have a controlled false discovery rate of 5%.
- Combine significance with a fold-change threshold (e.g., |log2FC| > 1) to focus on genes with both statistical and biological significance.
- Genes with large fold changes but high lfcSE are imprecisely estimated -- these are often low-expression genes with noisy count data.
- The volcano plot visualises -log10(padj) against log2FoldChange, placing the most significant and largest-effect genes in the upper corners.
- The MA plot shows log2FoldChange against baseMean, revealing whether fold-change estimates are consistent across expression levels.
Common Pitfalls
- Use padj (not pvalue) to determine significance. Raw p-values are not corrected for the thousands of simultaneous tests and will produce many false positives.
- Genes with very low baseMean may show dramatic fold changes that are unreliable because they are driven by small count differences.
- NA values in padj indicate genes removed by independent filtering (too little information to test). These are not errors.
- Log2 fold-change shrinkage (e.g., apeglm) changes the magnitude of log2FC values but does not alter padj. Shrinkage improves gene ranking for downstream analyses like GSEA.
Citations
References
- Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57(1), 289-300.
- Zhu, A., Ibrahim, J. G., & Love, M. I. (2019). Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics, 35(12), 2084-2092. doi:10.1093/bioinformatics/bty895.