A quest to chart DNA methylation changes in cancer
DNA methylation is the most well-characterized mean of epigenetic control, and it is widely deregulated in cancer. We develop a novel algorithm to nominate differentially methylated regions and apply it to cancer samples, creating an extensive catalog of cancer-specific and pan-cancer alterations
Epigenetic control enables the tight regulation of gene expression and consequently cell phenotype without modifying the underlying genetic code. Broadly speaking, gene regulatory mechanisms are the root of many vital features characterizing biological entities, such as the ability to adapt to new environments and respond to external stimuli quickly. Importantly, epigenetic control plays a critical role in multicellular organism development, capacitating the genesis of multiple cellular phenotypes in a process known as differentiation. In recent years, the dysregulation of the epigenome has been increasingly recognized as a core feature of virtually all cancers. Compared to the analyses of the cancer genome and transcriptome, our understanding of the epigenetic alterations in this context remains limited. Even the description of the most well-characterized epigenetic mechanism, the addition of a methyl group to cytosines (i.e., DNA methylation), still presents numerous blind spots. In addition, DNA methylation has gained increased attention in recent studies focusing on cell-free DNA profiling, further calling for biomarker nomination strategies for the development of liquid biopsy assays.
F1: schematic of our study. We developed and applied Rocker-meth to multiple DNA-methylation cancer datasets. We also exploited orthogonal omics data (gene expression data from TCGA and ENCODE chromatin states) to evaluate its capability of capturing biologically relevant features.
The Rocker-meth algorithm
Our work provides a detailed census of DNA methylation changes across 13 cancer types, creating an extensive catalog that encompasses three different classes of differentially methylated regions (DMRs). To build this collection, we first developed a novel computational algorithm named Rocker-meth. Our strategy is based on a multistep procedure: first, we assess the degree of CpG differential methylation is through the AUC statistics between normal and tumor samples. Next, the genome-wide profile of AUC is segmented through a Heterogenous Hidden Markov model, inferring the most likely state of differential methylation in the cancer samples as hypo-methylated, neutral, or hyper-methylated. Last, the statistical robustness of regions is assessed with an orthogonal test, with the option of measuring the support of each DMR in single samples. We observed that Rockermeth's performance is remarkable if compared to other tools, both on synthetic and real-world datasets.
F2: The Rocker-meth method involves four main steps; 1) computation of Area Under the Curve (AUC) values of methylation levels (i.e., beta values) in tumor versus normal samples; 2) segmentation of AUC values by a tailored heterogeneous Hidden Markov Model (HMM); 3) filters on segments features including intra-segment homogeneity and number of CpG sites; 4) identification of sample-specific DMRs by Z-score statistics (optional).
Collecting a pan-cancer catalog of differentially methylated regions
Grounding our analysis on the extensive sample collection provided by The Cancer Genome Atlas, we identify thousands DMRs, annotating them by genomic features, degree of differential methylation, and statistical significance. As expected, the patterns of DNA methylation changes are, at least in part, tumor-specific. However, we also observed a surprising fraction of DMRs that are shared across multiple tumor types, suggesting a common underlying mechanism causing a partial loss of the organized structure observed in healthy tissues. While hypermethylated DMRs are mainly short (~few Kbps), hypo-methylated DMRs can span even millions of base pairs. Those entities, previously reported as hypo-blocks or partially methylated regions, display a mild but very significant loss of DNA methylation and are preferentially observed in intergenic regions. Interestingly this class of DMRs is the most common across multiple tumor types. Furthermore, we were able to observe enrichments of specific DMRs categories in specific regulatory elements and repeated regions.
F3: Left, UMAP based of TCGA samples based on the average beta difference in each DMRs of the consensus set with respect to matched normal tissue. Right, Fraction of shared differential methylation events across tumor types. Solid line: DMRs, dashed line: single CpG sites. Hypo DMSs and Hyper DMSs were selected using an AUC below 0.2 and above 0.8 (lenient) or below 0.1 and above 0.9 (stringent).
Integration with gene expression data reveals context-specific effects of differential methylation
Next, an integrative analysis with matched gene expression data supports the notion that the effect of DNA methylation on transcriptional output is highly context-dependent, with DMRs displaying associations with gene deregulation only when found in specific regulatory regions. Notably, the underlying chromatin state of the tissue of origin is also associated with the degree of alteration observed in cancer. We observed that active transcriptional starting sites and polycomb repressed regions show significant enrichment in DNA hypermethylation, but this change has an opposite effect on the gene expression of the associated genes. We also noticed that genes upregulated following DNA methylation are enriched for Homeobox proteins, which in turn are heavily involved in cell fate decision and differentiation.
F4: Left, Dot plot showing the Odds-ratio of the pan-cancer enrichment of hyper (red) and hypo (blue) DMRs in under-expressed and over-expressed genes estimated by Fisher Exact Test (FET) for different genic annotations. Right, Bar plots showing the distribution of TF families for the TFs whose expression is changed in association with a hyper DMR.
Applying the DMR catalog to single-cell DNA methylation data
Last, we applied our DMR catalog to single-cell DNA methylation data from colorectal cancer patients. We were excited about the exceptional consistency observed in this independent dataset, and we could also observe some degree of heterogeneity in DMRs within the same patients, suggesting those events might be subclonal but early.
F5: Heatmap of DMRs beta values in scTRIO cohort. 50 randomly selected regions for each class are depicted. For the dendrogram, hierarchical clustering using 1 - Pearson’s correlation was applied to all cells having less than 10% of DMRs with missing values (NC: normal colon, TP: tumor primary)
Conclusions and future perspectives
To conclude, we first provide the cancer epigenetic community with an algorithm to analyze extensive collections of DNA methylation profiles. Our results on multiple cancer types consistently recapitulate multiple years of analyses, often conducted on single datasets or applying non-homogenous methodologies. It is important to notice that in the Rockermeth algorithm, no assumption is made on the nature of the data, making it applicable to other settings where differential methylation analysis is of interest. Furthermore, we provide a catalog of differentially methylated regions in cancer: we hope that this would serve as a basis to interpret future results and enhance our understanding of the role of DNA methylation in cancer development and evolution.
The Rockermeth package is available at: https://github.com/cgplab/Rockermeth
Original cover artwork by Gian Marco Franceschini