Designing Sample Metadata
Data PreparationThe metadata CSV maps each sample to its experimental conditions, batch variables, and covariates. Its first column must contain sample IDs that match the count-matrix column headers.
When to Use
- You are setting up a new RNA-seq analysis and need to define which samples belong to which experimental groups.
- You want to include additional covariates such as batch, RIN score, sequencing lane, or patient age in your model.
- Your experiment has a multi-factor design (e.g., Drug x CellLine x TimePoint) and you need each factor recorded in its own column.
Required Inputs
- First column: sample identifiers that match the column headers in the count matrix (trimmed, case-insensitive matching is applied).
- At least one categorical factor column (e.g., Treatment, Genotype, CellLine) defining the comparison of interest.
- Optional: continuous covariates (e.g., RIN_score, age, library_size) that can be included in the model.
What to Expect
- easyCris auto-detects categorical versus numeric columns from the metadata file.
- All detected factor columns appear as options in the model configuration dropdowns.
- Continuous covariates can be added to the design formula to control for confounding variables.
Common Pitfalls
- Mismatched sample IDs between the count matrix and the metadata file trigger a mismatch warning and can block analysis until resolved. Double-check for trailing spaces, different cases, or extra characters.
- Spaces or special characters in sample IDs (e.g., "Sample #1") can cause silent matching failures. Use underscores or simple alphanumeric names.
- A factor column with only one level provides no contrast -- the model has nothing to compare, so it will error during fitting.
- Numeric-coded factors (1, 2, 3) may be interpreted as continuous covariates. Use string labels (Group_A, Group_B) to ensure correct treatment as categorical variables.
Citations
References
- Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.