Combining nonlinear microscopy and deep learning classification for glioma diagnosis

Complementing real-time cellular image acquisition with real-time AI image classification for clinical applications.
Combining nonlinear microscopy and deep learning classification for glioma diagnosis

Gliomas represent about 80% of all malignant primary central nervous system tumors diagnosed in the Western world. Resection is challenging, especially of diffuse gliomas, that are characterized by progressive, infiltrative growth. How can we assist the neurosurgeon to improve resection and therefore patient outcome?

Portable microscopy

Earlier, we performed qualitative and quantitative research into glioma visualization utilizing third harmonic generation (THG) microscopy on lab-based setups (Kuzmin et al. (2016), Zhang et al. (2019)). Now, a couple of years later, we have two portable multiphoton setups available, through FlashPathology, for imaging of ex-vivo brain tissue in the hospital.

Colleague Laura working on the portable imaging setup.
Colleague Laura working on the portable imaging setup inside the hospital.

Transporting these setups in- or outside the hospital is a breeze, and imaging is as quick as touch-and-go. The other advantage of higher harmonic generation microscopy is that no tissue preparation is required, other than putting a piece of tissue in a dish. Label-free imaging with sub-cellular resolution of freshly resected tissue is only a matter of seconds, as opposed to standard histopathological staining taking hours.

With the benefits of on-site real-time image acquisition, real-time image analysis is a must. Given the uprise of deep learning in the field of computer science, we also wanted to investigate if accurate real-time image classification would be feasible using such an algorithm. This would help identify regions of interest and enable the neurosurgeon to get intraoperative feedback on the presence or absence of tumor in the resected tissue margins.

Deep learning classification

Deep learning algorithms are data hungry. It is difficult to determine beforehand how many patients your dataset should include before pattern recognition is achievable. We were in the midst of COVID-19 lockdowns and were limited to a small dataset of 23 patients. In order to classify between normal and tumor cases, image data from epilepsy tissue served as the normal category.

We built a fully-convolutional network and started training. During this process we came across several challenges: what kind of deep learning network is best for our case? How do we deal with data imbalance, weak labeling, and image noise? What we learned throughout the process is that a preliminary successful outcome is 90% based on the data.

Data challenges

By relying on minority class oversampling during training, we battled data imbalance. To deal with the small dataset, we included every available image acquired from each patient. This however resulted in average classification performance below par. It turned out some form of out-of-distribution detection was necessary to identify ‘noisy’ data that needed to be excluded. Thanks to Koho et al. (2016) we got on the right track. Utilizing ‘kurtosis’ as a measure of noise in the image’s frequency domain allowed us to identify noisy images and exclude them. By just excluding 4% of this noisy data from the original training set, validation binary accuracy improved from 64% to 84%.

Test set performance

Evaluating our network on an independent test set, annotated at the image-level by 3 pathologists, resulted in 79% binary accuracy. The average agreement between either 2 pathologists was 84%. With 96% specificity and 60% sensitivity, there is room for improvement but overall, we were very pleased with the results. Images are classified in only 35 ms! Given the challenges we had to overcome, this is an important first step that allows us to improve over time. We learned a lot from this first, serious application of deep learning to solve a classification problem.

All our data, code and trained models are publicly available for you to check out:


Weak labeling of data is something to deal with in the next step, where we can explore methods like multi-instance learning. This is also the case for inherent data noise that is systematically present in each image. Early experimenting with deconvolution shows promising results. In the end, drastic expansion of the dataset is the most important factor.

The current results act as a baseline and will allow us to improve over time. More clinical data acquisition and analysis method development are planned. Stay tuned!