Patient-level proteomic network prediction by explainable artificial intelligence

In this study we used explainable artificial intelligence to predict proteomic networks for individual tumors based on proteomic profiling data.

Like Comment
Read the paper

Proteomic networks for personalized medicine

The development of molecular profiling techniques including proteomics has not only led to extensive progress in the characterisation of different tumors, but has also allowed for the discovery of functional relationships among genes and proteins. Dysregulations of such networks are critical in tumorigenesis [1]. Precise knowledge about the underlying mechanisms  is therefore of particular importance in precision oncology, but the acquisition of functional data on individual patients’ molecular networks is challenging, because most network prediction methods require hundreds of samples for the prediction of one ‘average’ network.

Explainable artificial intelligence

To circumvent this problem, and to infer functional properties in tumors from molecular profiling data for individual patients, we applied an explainable artificial intelligence method, namely layer-wise relevance propagation (LRP) [2]. This method was, among others, an answer to the criticism of deep neural networks being ‘black boxes’. LRP can identify those features most relevant for a neural network’s prediction. Importantly, it does so for individual samples.

What we did

To infer the proteomic networks of tumors in individual patients, we trained a deep neural network on a proteomic data set to predict the abundance of a target protein based on the expression of all other proteins. Layer-wise relevance propagation was then applied to compute the importance of each protein for the prediction. By repeating this procedure for every protein as target protein, a complete network can be reconstructed and the LRP scores can be used as a measure of protein interaction strength.

What we found

The explainable AI method LRP showed state-of-the art performance in predicting one average network over many samples. For this purpose, we constructed a synthetic protein dataset in which, for each sample, the same groups of proteins interacted with each other. LRP was able to almost perfectly reconstruct the underlying network (AUC = 0.99).

We then increased the difficulty of the task with a second dataset. Here, in each synthetic proteomic sample, one of four different groups of proteins interacted and LRP was tasked to predict the interaction network for each sample without previous knowledge about the group membership. We could show that these individual networks can still be reconstructed with high accuracy (AUC = 0.93).

We next used a real proteomic dataset from The Cancer Proteome Atlas (TCPA)  consisting of 5114 tumor samples of 19 different cancer types. With our explainable AI approach, we first trained a neural network on half of the samples and then reengineered the proteomic networks of all other tumor samples. When we compared the predicted interactions to interactions registered in the database REACTOME [3], interactions with high LRP scores were significantly overrepresented in this database

A t-SNE analysis based on the predicted proteomic networks for every patient’s tumor showed that the majority of tumors of the same cancer type cluster together suggesting that they have similar protein interactions. However, our explainable AI approach shows, first, that network patterns vary even among tumors of the same cancer type and second, that some highly conserved network properties exist across tumor types. 

An interesting example of network patterns observed across tumor types was a cluster that consisted of pancreatic, gastric, colon and rectal cancer as well as lung adenocarcinoma, but the tumors’ proteomic networks showed a highly conserved pattern. Remarkably, this subnetwork of strongest interactions that comprised the proteins PARP, c-Met, Caspase 8, SNAIL, ERCC1 and RB had been described before for a lung adenocarcinoma cohort [4]. The prediction of interaction networks for individual samples in our study allowed us to conclude that even among patients with the same tumor type, this pattern is present only in a proportion of patients. 

Several of these proteins have been associated with tumor relapse or drug resistance, therefore, the discovery of a common underlying mechanism in these different cancers may be relevant for the disease’s treatment and prognosis, a hypothesis that needs to be investigated further.

Our study highlights the unique potential of explainable artificial intelligence for the prediction of proteomic networks in cancer. While we used proteomic profiling data here, our method can in principle be applied to any kind of profiling data as long as sufficient data are available. In future, we plan to investigate the relevance of these findings for clinical decision making by linking tumor proteomic networks to the occurrence of relapse or drug resistance.

[1]    D. Hanahan and R. A. Weinberg, ‘Hallmarks of Cancer: The Next Generation’, Cell, vol. 144, no. 5, pp. 646–674, Mar. 2011, doi: 10.1016/j.cell.2011.02.013.

[2]    S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, ‘On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation’, PLOS ONE, vol. 10, no. 7, p. e0130140, Jul. 2015, doi: 10.1371/journal.pone.0130140.

[3]    B. Jassal et al., ‘The reactome pathway knowledgebase’, Nucleic Acids Res., p. gkz1031, Nov. 2019, doi: 10.1093/nar/gkz1031.

[4]    A. Datta, S. Sikdar, and R. Gill, ‘Differences in protein-protein association networks for lung adenocarcinoma: A retrospective study’, Bioinformation, vol. 10, no. 10, pp. 647–651, Oct. 2014, doi: 10.6026/97320630010647.

Philipp Keyl