Association of mutation signature effectuating processes with mutation hotspots in driver genes and non-coding regions

Association of mutation signature effectuating processes with mutation hotspots in driver genes and non-coding regions

Mutations are mainly responsible for cancer development and growth. In recent years, there is a growing interest in deciphering mutational signatures which use every somatic point mutation in the cancer genome to explore tumor mutational processes and classify tumors. This project begun with an interest to identify an alternative signature set to exploit the power of nearby nucleotide information ignored by the COSMIC SBS mutational signatures, now referred as the extended NxSxN SBS-signature in our publication. The extended NxSxN SBS-signatures use one nucleotide apart from the trinucleotide context (NxSxN vs NSN), where S is the single base substitution, x are the neglected bases, and Ns are the nucleotides around the substitution. Using this motif we could identify a new set of mutational signatures.

We then faced a difficult question by having a brand-new set of mutational signatures: How does it compare to the COSMIC signatures? What can we learn about the tumor mutational processes by applying these signatures?

We addressed these questions by developing tools to map each signature to the known COSMIC SBS signature set and to find associated mutations of each mutational signatures. We discovered that our NxSxN SBS-signature set contains surrogate signatures and novel signatures when compared to the COSMIC SBS signature definitions. Our NxSxN SBS-signatures include a slightly improved version of homologous recombination deficiency signature (SBS-EX3), combining the COSMIC SBS signature set with our novel signatures can increase the power of classification on datasets. Meanwhile, we noticed some well-known cancer drivers are strongly associated with mutational signatures. We than began to question whether our method developed for mutational signature analysis can be applied to distinguish passenger mutations from driver mutations.

After modelling these dependencies, we created sigDriver, a tool specialized in using mutational signatures to discover driver mutations. Astonishingly our tool is able to reproduce known drivers such as TERT, PIK3CA, PTEN while providing information on which mutational signature(s) they present the strongest association. Meanwhile, our tool provides a different perspective to driver discovery in the non-coding region. sigDriver takes a search-then-annotate approach, which means the driver search is not bounded by databases about (non-)coding elements but searches across the whole genome in an unbiased fashion. Our analysis on a cohort of 3813 whole genome sequenced tumors revealed putative driver mutations on the coding region affecting genes such as MAPKAPK2, PLEKHS1 and ADGRG6.

If you are curious whether sigDriver or our mutational signatures can provide new insights into your dataset, try it out: https://github.com/wkljohn/sigDriver. For the full manuscript presenting the candidate driver mutations and a detailed description of you approach please read our publication.