At the 2017 annual ASCO meeting, I heard Dr. Ridrigo Dienstmann present recent developments by a consortium of researchers attempting to organize the molecular heterogeneity of colorectal cancer (CRC). This consortium established four consensus molecular subtypes (CMS1-4) of CRC based on whole-transcriptome data. The CMS subtypes exhibit distinct biological features and evidence is mounting that tumors in different subtypes may respond differently to standard chemotherapies, offering hope of personalizing adjuvant therapy for CRC patients. However, Dr. Dienstmann reported that there was no clinical diagnostic for identifying a patient’s CMS subtype. This seemed like a great opportunity.
Classification, as typically done in machine learning, assigns each sample into one of a predetermined set of boxes. Applying such a system to cancer tumors would almost certainly be an over simplification. Indeed, the initial CMS subtyping paper allows for a small set of unclassified samples. Moreover, subsequent research has revealed that due to intratumoral heterogeneity, some tumors have characteristics of multiple subtypes, in varying proportions.
To take these complexities into account, we treated each subtype as a class of its own and defined a continuous score, based on expression of a set of genes, that measures the likelihood that a sample should belong to that subtype. This allows for a sample to have high values of multiple scores, suggesting that it has features of multiple subtypes. Also, a sample that doesn't have high values of any score would be called unclassified. When we applied this method to CRC and the CMS1-4 subtypes, we found that 65% of samples were classified into a unique subtype, 15% were unclassified, and 20% had features of mixed types. This result supports defining CMS1-4 as distinct subtypes, but argues that it is best to not think of them as mutually exclusive.
If a sample was reported to have a mixed type it is natural to ask whether this was because of some numerical uncertainty (noise) in the algorithm, or because of a biological reality. Unpublished analyses support the later. Indeed, as the attached graphic representation of the classifications shows, samples of mixed CMS1 and CMS2 almost never occur, and mixed CMS3 and CMS4 almost never occur. Perhaps the frequencies of mixed types reflect distinct possible evolutionary paths for the tumors. For example, one research project into precancerous lesions showed that they almost always arise as either CMS1 or CMS2. Deeper understanding of the events leading to each of these subtypes is needed.