Data Dimension and Scaling Law: the Extrapolatability of Modeling Outputs


Modern biomedical research is basically a process of modeling. First, the researcher constructs an experiment system by hypothesizing that it exhibits factors driving the biological or pathological phenotypes. Second, she/he manipulates the input to the system and measures the outputs from it. Third, by comparing the modeling output with the data from the real system, the researcher evaluates the predictivity of the model, determining the extent that the hypothetical driving factors can explain. Finally, based on this result, she/he improves the model by removing the irrelevant factors and adding new factors.

Let’s take conventional approach of studying drug resistance in cancer as an example. A researcher treats a cancer cell line with chemotherapeutic drug continuously, until the cells recover from the treatment. She/he compares the difference between the parental and resistant lines in gene expression and signaling pathways. The identified biomarker will be validated in tumor samples, by pathological or genomic analyses, or even tested in a clinical study. In this example, the researcher hypothesizes that the cell line exhibits essential factors causing resistance (e.g. pre-existed mutated genes). After the drug treatment at clinically relevant doses (“input”), a subclone with the essential genetic alteration is selected (“output”). The predictivity of the identified biomarker is tested in human cases (“real system”). If the model is validated, the signaling pathway can be further studied to identify the therapeutic targets for overcoming the drug resistance.

Understandably, the choice of model system is critical for the success of such research. If the research is initiated with an irrelevant model, even after cycles of improvement, it would be still very hard to get meaningful results. Here is the question: how do we know the chosen model system is relevant? Asking in another way, how could we know if the modeling output can be extrapolated to the real system?

Let’s consider the nature of model again. In a way, modeling is the action of sampling in the real system. In this perspective, model is a subsystem of the real world. By this sense, the difference between model and the real system is scale, which is leading us to the concept of “scaling law”.

“Scaling law” sometimes is called “power law”, so what is that? Here is the definition copied from Wikipedia:

“In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another.” In plain words, if the relation among variables does not change through different sizes of the system, they are “scale-invariable”, thus follow scaling law. 

One famous example of scaling law in biology is Kleiber's law: regardless of the ranges of animal size, an animal's metabolic rate scales to the ¾ power of the animal's mass. West, Enquist, and Brown claimed (WEB model), the physiological base of Kleiber’s law is (1) metabolism should scale proportionally to nutrient flow in this circulatory system and (2) blood volume is a fixed fraction of body mass. No matter there is an optimal explanation, Kleiber's law allows extrapolation of results observed in laboratory animals to other animals. The scaling law, therefore, is the foundation of modeling in biomedical research.

We also can say that a model system is one point on the sampling space. However, based on the idea that biological systems mostly follow scaling law, we can observe the same relation of variables in different scales, investigating the driving factors behind it. This is very much like fractal: the self-mimic pattern always appears no matter how you change the scale of observation. The fractal here is not an analogy: it is constructed by power law. At the stage of hypothesis generation, it is hard to identify a relevant model system. Even without preliminary study, all we must do is to observe the same model system at different dimension, getting data from subsystems.

I’d like to use my research to explain this concept. There are various scales in tumor heterogeneity: (1) population level- tumors in all patients; (2) cohort level- tumors in a small group of patients; (3) personal level- tumors at different sites in a patient; (4) intratumor level- different regions of a tumor; and (5) single cell level. In the study we will publish soon, we built four mouse models of melanoma. By comparative study, we found the differential responses of these models to immunotherapy can be explained by their differentiation status. We validated the results on human melanoma at cohort level, i.e. data from a small group of (forty-two) patients in a clinical study. Recently, we performed single-cell RNA sequencing in one of the four models and found the cell subpopulations can also be categorized according to the differentiation status, likely causing differential response to immunotherapy, too. 

Taken together, these findings suggested that developmental heterogeneity (variation in differentiation status) exists over the scale of melanoma progression- in speculation, it can lead to the “scaling law” that the population size of tumors may have a proportional relation with overall differentiation status. Though whether such relation exists needs to be further studied, importantly, the universality of developmental heterogeneity in melanoma allows us to extrapolate discoveries from our models to the cohort of melanoma patients.

Experimentally, at a specific dimension, we have to collect paired genotype and phenotype data over a range of heterogeneity. For example, a GEM model of melanoma can generate multiple tumors. In contrast to selecting one for further studies by transplantation into multiple hosts, comparative analyses of multiple tumors will give data that can be mapped into patient datasets. Even when choosing one tumor for preclinical studies, analysis of data from tumor cohort will allow extrapolation, instead of analysis of average behaviors. This is why single cell sequencing prevails the field now: one model system can generate massive data over a wide range of heterogeneity. We can say it exhibit high extrapolatability.

In summary, extrapolatability of modeling is originated from the scaling law of biological systems. When planning a study, we have to choose or design the model system based on the extrapolatability of its output, which will determine the methods and tools used in the study.

Please sign in or register for FREE

If you are a registered user on Cancer Community, please sign in