Identifying mutational signatures in circulating cell-free DNA

DNA accumulates alterations over time due to a variety of endogenous and exogenous factors. A pattern of characteristic DNA changes is termed a mutational signature. We show these signatures can be detected in cell-free DNA, and may detect or signify a risk for developing cancer or other diseases.
Identifying mutational signatures in circulating cell-free DNA

Early detection enables early (and effective) treatment           
Early detection of disease minimizes patient suffering through earlier treatment and sometimes opens additional avenues of treatment. Cancer is an excellent example of this, as early detection increases the chance that tumors can be identified prior to spread, and treatments such as curative surgery can be performed. Thus, early detection of cancers may enable more effective treatment1.

What are liquid biopsies?
One emerging approach for cancer detection involves using blood tests to detect circulating tumor DNA (ctDNA), termed liquid biopsy2. ctDNA is released by tumor cells into the bloodstream, or other bodily fluids, which may be identified through DNA sequencing. Over recent years, liquid biopsies have shown potential for a variety of applications, including non-invasive cancer detection3, prognostication4,5 and treatment monitoring2,6.

Circulating mutational signatures
The DNA within our cells accumulates mutations throughout life, which are the result of mutational processes7,8. For example, endogenous processes (such as aging), and exogenous exposures (such as smoking) both cause characteristic patterns of mutations in the human genome9,10; these are termed mutational signatures. 

Since dying cells in the body release DNA into the bloodstream, we hypothesized that circulating DNA might reflect tissue mutational signatures (Figure 1). Using DNA sequencing, we first characterized circulating mutational signatures in blood plasma, then we developed a machine learning approach for sensitive cancer detection based on these signatures.

Figure 1. Circulating mutational signatures. (a) A pattern of characteristic DNA changes is termed a mutational signature. Both physiological and pathological mutational signatures cause mutations in the human genome, which accumulate over time in this example patient time course. Plasma DNA sequencing may identify circulating mutational signatures. (b) Early cancer detection using ctDNA is based on detection of cancer mutations in the circulation. Interrogation of physiological signatures might enable cancer risk profiling, prior to cancer development. Signatures reproduced from Alexandrov et al.9.

What did we do?
In this study, we examined mutational signatures in low-coverage whole-genome sequencing, with coverage between 0.3-1.5x. Low-coverage sequencing data has previously been used for copy number profiling11, and circulating DNA fragment size profiling12 (as the length of circulating DNA tends to be shorter in patients with cancer13). However, such data are not generally used for mutation profiling; the low coverage makes it challenging to distinguish between cancer associated signals and other sources of mutations, such as single nucleotide polymorphisms (SNPs).

As DNA sequencing becomes less expensive, cancer liquid biopsies are increasingly targeting broader sets of mutations or non-mutation markers using whole-genome sequencing14. Recent studies suggest that targeting a large number of markers can improve sensitivity for early-stage cancers and/or residual disease15–17. Thus, we suggest that cancer detection using signatures may provide a sensitive approach.

We developed a computational method called Pointy to study mutational signatures from low-coverage sequencing by aggregating mutations across the genome and performing signature fitting, followed by sample classification using machine learning. An outline is shown in Figure 2. 

Figure 2. Pointy overview. We developed a method for mutational signature profiling and cancer detection using low-coverage whole-genome sequencing. This involves fitting signatures to mutations in the data and applying machine learning to look for differences between healthy individuals and individuals with cancer.

What did we find?
We applied this approach to two datasets of low-coverage plasma whole-genome sequencing data from patients with stages I-IV cancer (n = 215), and healthy individuals (n = 227). We found that mutational signatures can be identified in circulating DNA when sequenced at low coverage.

Aging signatures were the most abundant, likely because aging affects all dividing cells in the body, with cancer cells dividing more rapidly. We commonly found circulating APOBEC signatures (Apolipoprotein B mRNA Editing Catalytic Polypeptide-like) in plasma across multiple cancer types. In tumor samples, Alexandrov et al.9 identify APOBEC-associated mutations in ~75% of cancer types and in over half of all cancers they analyzed. 

Interestingly, we found that aging signatures in healthy individuals significantly correlated with chronological age. This suggests that physiological signatures may be observable in plasma and contribute a biological component to the background signal.

We next applied machine learning to classify samples into healthy vs. cancer, based on their circulating mutation profiles (following the application of error suppression filters). We applied a Random Forest model to each cohort separately. This showed Area Under the Curve values of >0.96 for the detection of stage I-IV disease.

In short, we found that mutational signatures can be profiled in plasma in patients with cancer, and healthy individuals. These circulating signatures can be used for cancer detection, by leveraging machine learning.

Future directions
In future, deeper and more error-suppressed sequencing would enable further study of circulating mutational signatures. These signatures, which may be studied in both patients with cancer and healthy individuals, may provide an insight into the myriad mutational processes within somatic cells in the body.

We suggest that the study of circulating mutational signatures might enable the next generation of cancer liquid biopsy approaches. Mutational signature-based liquid biopsies could profile such signatures for the early detection of cancer (in the pre-symptomatic phase of cancer), and, potentially, for profiling of personalized cancer risk (in the pre-cancer phase) or even identify the risk for other diseases.

In future, plasma signature profiling might be performed across multiple non-cancer diseases and in apparently healthy individuals. For example, in ulcerative colitis, signature (SBS17) has been identified in tissue, which may be associated with inflammatory processes18. Furthermore, aside from aging, other signatures have been identified in healthy tissues, such signatures related to alcohol consumption19 and tobacco smoking20 (ref 3), and we can imagine a future where similar methods to ours are able to non-invasively detect the effects of these exogenous factors in healthy individuals, in the absence of cancer.

In summary, our method provides a non-invasive approach to identify mutational signatures through plasma cell-free DNA sequencing and identifies cancer affected individuals by machine learning techniques.



  1. World Health Organization. Guide to cancer early diagnosis. World Health Organization (2017).
  2. Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer 17, 223–238 (2017).
  3. Forshew, T. et al. Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA. Science Transl. Med. 4, 136ra68 (2012).
  4. Kurtz, D. M. et al. Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction. Cell 178, 699-713.e19 (2019).
  5. Wan, J. C. M., White, J. R. & Diaz, L. A. “Hey CIRI, What’s My Prognosis?” Cell 178, 518–520 (2019).
  6. Wan, J. C. M. et al. Liquid biopsies for residual disease and recurrence. Med 2, 1292–1313 (2021).
  7. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
  8. Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
  9. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
  10. Li, R. et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 398–403 (2021) doi:10.1038/s41586-021-03836-1.
  11. Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nature Communications 8, 1324 (2017).
  12. Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).
  13. Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Science Translational Medicine 10, 1–14 (2018).
  14. Im, Y. R., Tsui, D. W. Y., Diaz, L. A. & Wan, J. C. M. Next-Generation Liquid Biopsies: Embracing Data Science in Oncology. Trends in Cancer 7, 283–292 (2021).
  15. Wan, J. C. M. et al. ctDNA monitoring using patient-specific sequencing and integration of variant reads. Science Transl. Med. 12, eaaz8084 (2020).
  16. Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat Biotechnol 34, 547–55 (2016).
  17. Zviran, A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nature Medicine 26, 1114–1124 (2020).
  18. Kakiuchi, N. & Ogawa, S. Clonal expansion in non-cancer tissues. Nature Reviews Cancer 21, 239–256 (2021).
  19. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
  20. Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

Cover photo credit: Phillip Jeffrey (link). License CC BY-SA 2.0.

Please sign in or register for FREE

If you are a registered user on Nature Portfolio Cancer Community, please sign in