Ex) Article Title, Author, Keywords
Ex) Article Title, Author, Keywords
Progress in Medical Physics 2019; 30(2): 39-48
Published online June 30, 2019
https://doi.org/10.14316/pmp.2019.30.2.39
Copyright © Korean Society of Medical Physics.
Hongyoon Choi
Correspondence to:Hongyoon Choi (chy1000@gmail.com)
Tel: 82-2-2072-3347
Fax: 82-2-745-7690
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep learning has been applied to various medical data. In particular, current deep learning models exhibit remarkable performance at specific tasks, sometimes offering higher accuracy than that of experts for discriminating specific diseases from medical images. The current status of deep learning applications to molecular imaging can be divided into a few subtypes in terms of their purposes: differential diagnostic classification, enhancement of image acquisition, and image-based quantification. As functional and pathophysiologic information is key to molecular imaging, this review will emphasize the need for accurate biomarker acquisition by deep learning in molecular imaging. Furthermore, this review addresses practical issues that include clinical validation, data distribution, labeling issues, and harmonization to achieve clinically feasible deep learning models. Eventually, deep learning will enhance the role of theranostics, which aims at precision targeting of pathophysiology by maximizing molecular imaging functional information.
KeywordsDeep learning, Molecular imaging, Theranostics, Medical imaging, Imaging biomarker
Deep learning rapidly begins to be applied in the medical field. Recently, several deep learning-related medical devices and softwares have been developed and started to be applied in the clinical fields.1) The major contribution of deep learning to medical data was to objectively evaluate high-dimensional medical data and remarkably reduce laborious works such as segmentation and object detection from high-resolution images. The major medical application is medical imaging fields as a boom of deep learning was started from the computer vision field initiated by ImageNet Challenge.2,3) The methods and neural network architectures developed for ImageNet Challenge have been applied to medial images including radiologic and pathologic exams as well as natural photographic images. These approaches based on computer vision fields have showed remarkable performance in differential diagnosis. For natural photographic images such as skin images and fundoscopy deep learning techniques were relatively easily adopted as convolutional neural network (CNN) models developed for ImageNet Challenge were directly transferred to such images.4,5) Moreover, CNN which show good performance on image classification and processing have been applied to radiologic exams such as chest X-ray and mammography.6-8) Subsequently, CNN models have been used for image-based diagnosis as well as image processing.9) The application of deep learning included 3-dimensional images such as computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) data as well as 2-dimensional radiologic exams. The purpose of clinical use was also expanded to include various applications such as image-based differential diagnosis, segmentation, and image enhancement. Because of the substantial different features of molecular imaging including PET and single-photon emission computed tomography (SPECT) from natural images, there have been various concerns with regard to application of deep learning. Nonetheless, various deep learning techniques have suggested feasible applications to enhance molecular imaging and solved problems such as image resolution and sensitivity.10) In this review, current deep learning models for nuclear medicine and molecular imaging are summarized according to the clinical purposes. In order to develop robust deep learning models and guide their appropriate direction for clinical use, practical issues of current deep learning are introduced in this review.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Current deep learning models particularly for molecular imaging have focused on various different applications: Image-based diagnosis, enhancing image reconstruction and image quality, and deep learning application for image-based quantification (Table 1).11-44)
Intuitively, one of the most important applications of deep learning in medical fields was differential diagnosis. For molecular imaging studies, as deep learning models generally require a large dataset for the training, several models have used PET or SPECT images which routinely acquired in the clinical setting. One of the major applications was differentiating disorders from normal status. Recently, using fluorodeoxyglucose (FDG) PET images, a few deep CNN models for the differential diagnosis were suggested. For example, using FDG PET images, a deep learning model was developed to differentiate metastatic mediastinal lymph nodes from benign lymph nodes in lung cancer.11) Using a deep CNN, diagnostic accuracy for differentiating metastatic lymph nodes was 86%, which was higher than conventional machine learning algorithms.11) Another CNN model to differentiate T-stages from lung cancer showed comparable results to identify pathologic T-staging.12) Area of receiver-operating-characteristic curve was 0.68 for differentiating advanced T-stage tumors in an independent test set. Deep CNN models have been developed for differential diagnosis of brain disorders using brain SPECT or PET images. As a binary classification problem, dopamine transporter imaging has been interpreted by experts' reading, thus, it was a good candidate for the deep CNN application. A 3-dimensional CNN model showed high accuracy for differentiating 123I-FP-CIT SPECT images of Parkinson's disease from those of controls.19) As accurate image-based diagnosis and the prediction of future cognitive decline in Alzheimer's disease (AD) and mild cognitive impairment (MCI) patients have been clinically important issues, several deep learning models using MRI and PET have been suggested. One of the first research of deep learning application to medical images was representation learning for PET and MRI images for diagnosing AD.17,18) Though these pioneer studies did not use CNN, regarded as a
Another important application is enhancement of image reconstruction and image quality. For example, CNN models were incorporated into iterative reconstruction framework and showed better performance than conventional denoising algorithms.27) As a generalized approach, deep learning was used to solve the inverse function of signals encoded by sensors including MRI and PET with regard to the image reconstruction, which resulted in fully-automated and flexible reconstruction framework.28) Furthermore, attenuation correction, a crucial step of PET image reconstruction, was aided by deep learning-based attenuation maps. While CT incorporated in fusion PET/CT scanners can provide attenuation information, recent PET/MR requires synthetic CT attenuation maps. Because of the difficulty in the estimation of attenuation map without CT, there have been various issues regarding PET quantification.45,46) Recently suggested deep learning-based CT image synthesis using MR or PET images is promising to solve the quantification issues caused by attenuation correction.30-34) Additionally, deep learning has been used to enhance image quality for low dose PET images.35-37) By combining the algorithms for image reconstruction with low-dose radiotracers and PET- or MR-based attenuation correction can dramatically reduce radiation exposure in the future. Such an ultra-low dose PET may be used for new clinical purposes including disease screening which has been difficult to obtain benefits due to radiation hazards.
As molecular imaging provides quantitative value related to pathophysiology, studies have focused on the application of deep learning to obtain accurate quantification. The most common application of deep learning to medical images is segmentation.9) The segmentation methods are usually based on anatomical images such as CT and MRI. As recent clinical molecular imaging modalities provides fusion images such as PET/CT, PET/MR, and SPECT/CT, deep learning-based segmentation methods can be used to calculate quantitative values such as the accumulation of radiotracer in a specific tissue delineated by anatomical imaging.39,47) The quantification can be improved by generative models such as generative adversarial networks (GAN). For example, pseudo-MR images were generated by AV-45 PET using GAN for the quantification of cortical radiotracer uptake without structural MR acquisition.43)
Even though various deep learning techniques have applied to molecular imaging for differential diagnosis, image enhancement, and accurate quantification, there are many issues that need to be solved in order to be clinically used. One of the gaps between deep learning approaches for natural image recognition and medical images, particularly molecular imaging, is placed on the purpose of imaging. While the image recognition task has simple labels, clinicians often require various types of information from medical images. They include prediction of prognostic outcome and treatment response as well as differential diagnosis.10) In a narrower range, differential diagnosis is similar with labels of natural images; however, many diagnostic classifications are not simple classification. Because many disorders have a spectrum ranged from healthy to fully-blown disease status, ground-truth labels widely used in deep learning training are ambiguous in medical images. Furthermore, a gold standard of diagnostic classification is variable according to disease types as well as clinical situations.48) Thus, if we think more deeply, the eventual purpose of deep learning application to the medical field is not just for simple diagnosis, but for looking to play a critical role in clinical decision.49) As molecular imaging intrinsically provides molecular and pathophysiologic properties with noninvasive manner, deep learning algorithms should more emphasize on the acquisition of objective quantitative value which can predict future outcome and treatment response. Instead of the achievement of the state-of-the-art in classification accuracy, we should find appropriate clinical application of the output of deep learning. For example, a deep learning model was developed for discriminating Alzheimer's disease and normal aged subjects, however, the importance of the application of this model was to transfer to the MCI subjects who would rapidly progress to full-blown dementia.13) The output of the CNN model represents a probability of Alzheimer's disease in a cohort consisting of Alzheimer's disease and normal subjects. As the output of the CNN was estimated by patterns of FDG and amyloid deposit in the brain, these patterns could be associated with a predictive biomarker for the outcome of MCI subjects (Fig. 1).
Even though many deep learning models show remarkable performance on the classification problem, such as discriminating fundoscopy images or brain PET images, most models are not validated in the real-world clinical settings. It is related to the evaluation of the performance when a suggested deep learning model tries to be used in the clinical setting. To achieve this validation issue, deep learning models should be tested in an independent test set from the training and internal validation data. The most commonly used method is the application to datasets obtained from different centers.50) Even though deep learning models are validated in an external dataset and show good performance on diagnostic classification or prediction for clinical outcome, they can hardly guarantee the same performance in the heterogeneous clinical environment. That is because the cohort used for the development of deep learning models are different from clinical trials, in which subjects are recruited with specific criteria defined for a clinical setting.51) The problem is placed on the fact that patients in the clinical setting are highly heterogeneous and clinical decision should be made under various situations. For example, deep learning models were mostly developed by a training cohort which consists of patients with a particular disorder and healthy controls. Training and even more validation cohorts usually include similar number of patients and controls. However, in the clinical situation, differential diagnosis or clinical decision is made under the patients' symptoms and signs instead of the simple classification. There are different disorders similar to a given disease status which aims at a deep learning model, even more, a few types of rare disorders. The ratio of disease status and healthy status can be considerably different from the cohort for the training. The problem with data distribution is a bigger factor when we use the deep learning model for disease screening purposes in general population (Fig. 2). This is the reason why deep learning models should be subjected to clinical trials in spite of the high accuracy, and it is necessary to make appropriate use criteria and use it clinically under limited clinical situations.
The issues regarding data distribution and ‘unseen data’ in training cohorts can be extended to uncertainty. Under the current approaches of supervised learning from big data and their labels, deep learning-based diagnosis and clinical outcome prediction requires diagnostic uncertainty due to unseen and rare cases. Furthermore, clinical decision is not made by differential diagnosis of high probability, but the exclusion of critical diagnosis related to life-threatening. Lowering the uncertainty of a fatal disease is one of the most important factors in diagnostic testing and one of the most important elements of clinical decision to be achieved through biomarkers.52) Thus, deep learning models should provide uncertainty in its decision to determine whether subjects need additional diagnostic tests. Bayesian approximation with DL for uncertainty measurement is a good example for supervised learning models.53) Another way to bypass the issue regarding uncertainty and unseen data, particularly rare disorders, is to employ unsupervised learning for the anomaly detection. As deep learning is representation learning, latent features in imaging data could show distribution according to training datasets. After the definition of distribution of latent features in the training data, unseen data can be identified by the definition in the latent space.54,55) As conditional generative models such as conditional GAN or variational autoencoders synthesize virtual data of specific conditions, it can be used to define a population distribution of specific conditions. For example, by training a generative model for normal aging changes in brain metabolism, a pseudo-population distribution of brain metabolism at each age can be generated.56) This generated population distribution will be used to find abnormal patterns taking age information into consideration from a given brain image. This type of anomaly detection can bypass the issue related to deep learning models for heterogeneous disorders.
Unsupervised learning is an important approach to solve practical issues in labels of imaging data. The labeling of image data, particularly for medical imaging is expansive as well as time-consuming. It requires experts to interpret the images or to decide clinical diagnosis. To obtain ‘gold standard’ diagnosis, many cases require clinical follow-up interpretations, which need a complex professional review process for medical records. Obviously, ethical issues with regard to the acquisition of large data and their label are inevitable. It is a big obstacle to deep learning application that the data with such labels are limited and labeling as a large scale is much more difficult. In addition, many nuclear medicine and molecular imaging data are more difficult to obtain with large scale with labels as various imaging techniques are used according to the clinical purposes.
One of the ways to overcome this labeling issue will be found in the property of medical imaging data. It is relatively easy to collect heterogeneous image data obtained for clinical routine. By using these clinical routine data and unsupervised learning methods, representative features can be obtained. These representative features will be visualized by dimension reduction methods to intuitively identifying patterns of large imaging data. Furthermore, these features obtained by unsupervised learning can be transferred to relatively small datasets which contain both labels and images. This transfer learning can produce a robust deep learning model even if the well-labeled data is relatively small (Fig. 3).57,58) The flexible application of unsupervised learning and transfer learning can be extended to semi-supervised learning. As aforementioned, a database clinically routinely obtained can be relatively easily obtained and a few data in the large unlabeled data can be labeled with the clinical outcome or diagnosis. In spite of a small labeled samples, various deep learning approaches employ unlabeled data to find discriminative representations for small labeled samples.59,60) For example, a study was aimed at prediction of FDG uptake estimated by PET using gene expression data for lung cancer, while a small number of subjects include both PET and gene expression data. By employing a larger gene expression dataset without PET data, a prediction model of FDG uptake can be developed.61) As many clinical data are placed on the situation of ‘large unlabeled data and small labeled data’, the deep learning model which can enhance performance through unsupervised learning and unlabeled data will be widely used in future molecular imaging and medical data research.
Another feasible way to overcome the labeling issue is to employ multiple unstructured data corresponding to imaging data. For example, clinical imaging data include text reports which included human interpretation results with natural languages. Even though these reports are mostly unstructured, they have a lot of information of image labels, including differential diagnosis, abnormal findings and disease locations. Data mining of the semantic interactions of medical images and texts will be a feasible approach to develop a deep learning model based on real-world clinical data.62) As self-supervised learning of imaging representations using a deep learning model for semantic context can be already used in natural image data, medical imaging data will be trained by representations of text reports.63) The learning of representations of the imaging data and finding their clinical significance can be a data-driven approach to develop biomarker without a priori knowledge. The self-supervised learning will be one of the future directions of a data-driven approach and will be achieved by using a text report or intrinsic information, such as age and gender matched with image data.
One of the overlooked practical issues is data harmonization. Molecular imaging routinely used in the clinical setting has various types. Numerous tracers can be used to obtain imaging data according to their clinical purposes. Furthermore, image acquisition protocols are varied according to the centers, which may reduce the accuracy of deep learning models when they aim at generalized application for multiple centers. Different imaging textures related to different detector types and image reconstruction algorithms can affect the performance of deep learning. Furthermore, the distribution of tracer has temporal dynamics, image acquisition at different time points may influence on the acquisition of deep learning-based biomarkers. Recently, deep learning has been used to analyze kinetics of dynamic imaging data,64) however, most imaging data routinely obtained in the clinic are static images, which require harmonization for multiple centers. The different tracers which aim at same molecular targets also cause a harmonization problem. For example, to obtain the information of brain amyloid deposits, several radiotracers are available, e.g., 11C-PIB, 18F-Florbetapir, 18F-Florbetaben, and 18F-Flutemetamol. These PET imaging show similar results though different quantification results.65,66) While classical amyloid quantification can be overcome by linear correction, deep learning models using heterogeneous image data with these different tracers are challenging.
In this review, current deep learning models developed for molecular imaging have been briefly introduced in terms of their purposes. As molecular imaging has information of molecular changes regarding pathophysiology, accurate and objective quantification is a critical step to use in the clinic. This quantitative information is linked to clinical decision and prediction of outcome as well as differential diagnosis. Thus, instead of simple diagnostic classification, we should focus on the discovery of biomarkers by extracting functional information of molecular imaging using deep learning. This information can contribute to theranostic approaches, which aim at the combination of diagnostics and therapeutics using same molecular targets. Deep learning models will summarize the status of patients with quantitative value. The models should be clinically validated under the clinical situation with unbiased data instead of limited datasets. Clinically validated molecular imaging-based biomarker can be used to monitor the disease status in terms of functional information. By predicting the outcome of the patient at the individual level using imaging data, therapeutic plans including dose and schedule as well as treatment methods can be personalized. To facilitate the clinically feasible deep learning models, it is promising to leverage unlabeled data and unsupervised learning. This approach will be used to considerably untangle the issues induced by supervised learning approaches which have been employed by most of deep learning models for imaging data. These issues included the heterogeneous data distribution, unseen data and uncertainty of decisions. Furthermore, unsupervised learning followed by transfer learning can develop various types of deep learning models with relatively small samples. Because of the distinctiveness of the medical field and the various purposes of molecular imaging, the development of a deep learning model that meets the particular clinical goals will be necessary, and the result will be an objective biomarker that plays an important role in objective clinical decision.
The author has nothing to disclose.
All relevant data are within the paper and its Supporting Information files.
Types of current deep learning applications for nuclear medicine and molecular imaging
Types of applications | Examples | References |
---|---|---|
Image-based diagnosis | Cancer staging (T- and N-staging) | 11,12 |
Diagnosis of Alzheimer’s disease using PET and/or MRI | 13–18 | |
Diagnosis of Parkinson’s disease using dopamine transporter imaging | 19–21 | |
Prediction of coronary heart disease | 22–24 | |
Enhancement of image reconstruction and image quality | Image reconstruction | 25–29 |
Attenuation correction | 30–34 | |
Recovery of low-dose PET images | 35–37 | |
Image-based quantification | Segmentation | 38–42 |
Image generation for quantification | 43,44 |
Progress in Medical Physics 2019; 30(2): 39-48
Published online June 30, 2019 https://doi.org/10.14316/pmp.2019.30.2.39
Copyright © Korean Society of Medical Physics.
Hongyoon Choi
Department of Nuclear Medicine, Seoul National University Hospital, Seoul, Korea
Correspondence to:Hongyoon Choi (chy1000@gmail.com)
Tel: 82-2-2072-3347
Fax: 82-2-745-7690
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep learning has been applied to various medical data. In particular, current deep learning models exhibit remarkable performance at specific tasks, sometimes offering higher accuracy than that of experts for discriminating specific diseases from medical images. The current status of deep learning applications to molecular imaging can be divided into a few subtypes in terms of their purposes: differential diagnostic classification, enhancement of image acquisition, and image-based quantification. As functional and pathophysiologic information is key to molecular imaging, this review will emphasize the need for accurate biomarker acquisition by deep learning in molecular imaging. Furthermore, this review addresses practical issues that include clinical validation, data distribution, labeling issues, and harmonization to achieve clinically feasible deep learning models. Eventually, deep learning will enhance the role of theranostics, which aims at precision targeting of pathophysiology by maximizing molecular imaging functional information.
Keywords: Deep learning, Molecular imaging, Theranostics, Medical imaging, Imaging biomarker
Deep learning rapidly begins to be applied in the medical field. Recently, several deep learning-related medical devices and softwares have been developed and started to be applied in the clinical fields.1) The major contribution of deep learning to medical data was to objectively evaluate high-dimensional medical data and remarkably reduce laborious works such as segmentation and object detection from high-resolution images. The major medical application is medical imaging fields as a boom of deep learning was started from the computer vision field initiated by ImageNet Challenge.2,3) The methods and neural network architectures developed for ImageNet Challenge have been applied to medial images including radiologic and pathologic exams as well as natural photographic images. These approaches based on computer vision fields have showed remarkable performance in differential diagnosis. For natural photographic images such as skin images and fundoscopy deep learning techniques were relatively easily adopted as convolutional neural network (CNN) models developed for ImageNet Challenge were directly transferred to such images.4,5) Moreover, CNN which show good performance on image classification and processing have been applied to radiologic exams such as chest X-ray and mammography.6-8) Subsequently, CNN models have been used for image-based diagnosis as well as image processing.9) The application of deep learning included 3-dimensional images such as computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) data as well as 2-dimensional radiologic exams. The purpose of clinical use was also expanded to include various applications such as image-based differential diagnosis, segmentation, and image enhancement. Because of the substantial different features of molecular imaging including PET and single-photon emission computed tomography (SPECT) from natural images, there have been various concerns with regard to application of deep learning. Nonetheless, various deep learning techniques have suggested feasible applications to enhance molecular imaging and solved problems such as image resolution and sensitivity.10) In this review, current deep learning models for nuclear medicine and molecular imaging are summarized according to the clinical purposes. In order to develop robust deep learning models and guide their appropriate direction for clinical use, practical issues of current deep learning are introduced in this review.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Current deep learning models particularly for molecular imaging have focused on various different applications: Image-based diagnosis, enhancing image reconstruction and image quality, and deep learning application for image-based quantification (Table 1).11-44)
Intuitively, one of the most important applications of deep learning in medical fields was differential diagnosis. For molecular imaging studies, as deep learning models generally require a large dataset for the training, several models have used PET or SPECT images which routinely acquired in the clinical setting. One of the major applications was differentiating disorders from normal status. Recently, using fluorodeoxyglucose (FDG) PET images, a few deep CNN models for the differential diagnosis were suggested. For example, using FDG PET images, a deep learning model was developed to differentiate metastatic mediastinal lymph nodes from benign lymph nodes in lung cancer.11) Using a deep CNN, diagnostic accuracy for differentiating metastatic lymph nodes was 86%, which was higher than conventional machine learning algorithms.11) Another CNN model to differentiate T-stages from lung cancer showed comparable results to identify pathologic T-staging.12) Area of receiver-operating-characteristic curve was 0.68 for differentiating advanced T-stage tumors in an independent test set. Deep CNN models have been developed for differential diagnosis of brain disorders using brain SPECT or PET images. As a binary classification problem, dopamine transporter imaging has been interpreted by experts' reading, thus, it was a good candidate for the deep CNN application. A 3-dimensional CNN model showed high accuracy for differentiating 123I-FP-CIT SPECT images of Parkinson's disease from those of controls.19) As accurate image-based diagnosis and the prediction of future cognitive decline in Alzheimer's disease (AD) and mild cognitive impairment (MCI) patients have been clinically important issues, several deep learning models using MRI and PET have been suggested. One of the first research of deep learning application to medical images was representation learning for PET and MRI images for diagnosing AD.17,18) Though these pioneer studies did not use CNN, regarded as a
Another important application is enhancement of image reconstruction and image quality. For example, CNN models were incorporated into iterative reconstruction framework and showed better performance than conventional denoising algorithms.27) As a generalized approach, deep learning was used to solve the inverse function of signals encoded by sensors including MRI and PET with regard to the image reconstruction, which resulted in fully-automated and flexible reconstruction framework.28) Furthermore, attenuation correction, a crucial step of PET image reconstruction, was aided by deep learning-based attenuation maps. While CT incorporated in fusion PET/CT scanners can provide attenuation information, recent PET/MR requires synthetic CT attenuation maps. Because of the difficulty in the estimation of attenuation map without CT, there have been various issues regarding PET quantification.45,46) Recently suggested deep learning-based CT image synthesis using MR or PET images is promising to solve the quantification issues caused by attenuation correction.30-34) Additionally, deep learning has been used to enhance image quality for low dose PET images.35-37) By combining the algorithms for image reconstruction with low-dose radiotracers and PET- or MR-based attenuation correction can dramatically reduce radiation exposure in the future. Such an ultra-low dose PET may be used for new clinical purposes including disease screening which has been difficult to obtain benefits due to radiation hazards.
As molecular imaging provides quantitative value related to pathophysiology, studies have focused on the application of deep learning to obtain accurate quantification. The most common application of deep learning to medical images is segmentation.9) The segmentation methods are usually based on anatomical images such as CT and MRI. As recent clinical molecular imaging modalities provides fusion images such as PET/CT, PET/MR, and SPECT/CT, deep learning-based segmentation methods can be used to calculate quantitative values such as the accumulation of radiotracer in a specific tissue delineated by anatomical imaging.39,47) The quantification can be improved by generative models such as generative adversarial networks (GAN). For example, pseudo-MR images were generated by AV-45 PET using GAN for the quantification of cortical radiotracer uptake without structural MR acquisition.43)
Even though various deep learning techniques have applied to molecular imaging for differential diagnosis, image enhancement, and accurate quantification, there are many issues that need to be solved in order to be clinically used. One of the gaps between deep learning approaches for natural image recognition and medical images, particularly molecular imaging, is placed on the purpose of imaging. While the image recognition task has simple labels, clinicians often require various types of information from medical images. They include prediction of prognostic outcome and treatment response as well as differential diagnosis.10) In a narrower range, differential diagnosis is similar with labels of natural images; however, many diagnostic classifications are not simple classification. Because many disorders have a spectrum ranged from healthy to fully-blown disease status, ground-truth labels widely used in deep learning training are ambiguous in medical images. Furthermore, a gold standard of diagnostic classification is variable according to disease types as well as clinical situations.48) Thus, if we think more deeply, the eventual purpose of deep learning application to the medical field is not just for simple diagnosis, but for looking to play a critical role in clinical decision.49) As molecular imaging intrinsically provides molecular and pathophysiologic properties with noninvasive manner, deep learning algorithms should more emphasize on the acquisition of objective quantitative value which can predict future outcome and treatment response. Instead of the achievement of the state-of-the-art in classification accuracy, we should find appropriate clinical application of the output of deep learning. For example, a deep learning model was developed for discriminating Alzheimer's disease and normal aged subjects, however, the importance of the application of this model was to transfer to the MCI subjects who would rapidly progress to full-blown dementia.13) The output of the CNN model represents a probability of Alzheimer's disease in a cohort consisting of Alzheimer's disease and normal subjects. As the output of the CNN was estimated by patterns of FDG and amyloid deposit in the brain, these patterns could be associated with a predictive biomarker for the outcome of MCI subjects (Fig. 1).
Even though many deep learning models show remarkable performance on the classification problem, such as discriminating fundoscopy images or brain PET images, most models are not validated in the real-world clinical settings. It is related to the evaluation of the performance when a suggested deep learning model tries to be used in the clinical setting. To achieve this validation issue, deep learning models should be tested in an independent test set from the training and internal validation data. The most commonly used method is the application to datasets obtained from different centers.50) Even though deep learning models are validated in an external dataset and show good performance on diagnostic classification or prediction for clinical outcome, they can hardly guarantee the same performance in the heterogeneous clinical environment. That is because the cohort used for the development of deep learning models are different from clinical trials, in which subjects are recruited with specific criteria defined for a clinical setting.51) The problem is placed on the fact that patients in the clinical setting are highly heterogeneous and clinical decision should be made under various situations. For example, deep learning models were mostly developed by a training cohort which consists of patients with a particular disorder and healthy controls. Training and even more validation cohorts usually include similar number of patients and controls. However, in the clinical situation, differential diagnosis or clinical decision is made under the patients' symptoms and signs instead of the simple classification. There are different disorders similar to a given disease status which aims at a deep learning model, even more, a few types of rare disorders. The ratio of disease status and healthy status can be considerably different from the cohort for the training. The problem with data distribution is a bigger factor when we use the deep learning model for disease screening purposes in general population (Fig. 2). This is the reason why deep learning models should be subjected to clinical trials in spite of the high accuracy, and it is necessary to make appropriate use criteria and use it clinically under limited clinical situations.
The issues regarding data distribution and ‘unseen data’ in training cohorts can be extended to uncertainty. Under the current approaches of supervised learning from big data and their labels, deep learning-based diagnosis and clinical outcome prediction requires diagnostic uncertainty due to unseen and rare cases. Furthermore, clinical decision is not made by differential diagnosis of high probability, but the exclusion of critical diagnosis related to life-threatening. Lowering the uncertainty of a fatal disease is one of the most important factors in diagnostic testing and one of the most important elements of clinical decision to be achieved through biomarkers.52) Thus, deep learning models should provide uncertainty in its decision to determine whether subjects need additional diagnostic tests. Bayesian approximation with DL for uncertainty measurement is a good example for supervised learning models.53) Another way to bypass the issue regarding uncertainty and unseen data, particularly rare disorders, is to employ unsupervised learning for the anomaly detection. As deep learning is representation learning, latent features in imaging data could show distribution according to training datasets. After the definition of distribution of latent features in the training data, unseen data can be identified by the definition in the latent space.54,55) As conditional generative models such as conditional GAN or variational autoencoders synthesize virtual data of specific conditions, it can be used to define a population distribution of specific conditions. For example, by training a generative model for normal aging changes in brain metabolism, a pseudo-population distribution of brain metabolism at each age can be generated.56) This generated population distribution will be used to find abnormal patterns taking age information into consideration from a given brain image. This type of anomaly detection can bypass the issue related to deep learning models for heterogeneous disorders.
Unsupervised learning is an important approach to solve practical issues in labels of imaging data. The labeling of image data, particularly for medical imaging is expansive as well as time-consuming. It requires experts to interpret the images or to decide clinical diagnosis. To obtain ‘gold standard’ diagnosis, many cases require clinical follow-up interpretations, which need a complex professional review process for medical records. Obviously, ethical issues with regard to the acquisition of large data and their label are inevitable. It is a big obstacle to deep learning application that the data with such labels are limited and labeling as a large scale is much more difficult. In addition, many nuclear medicine and molecular imaging data are more difficult to obtain with large scale with labels as various imaging techniques are used according to the clinical purposes.
One of the ways to overcome this labeling issue will be found in the property of medical imaging data. It is relatively easy to collect heterogeneous image data obtained for clinical routine. By using these clinical routine data and unsupervised learning methods, representative features can be obtained. These representative features will be visualized by dimension reduction methods to intuitively identifying patterns of large imaging data. Furthermore, these features obtained by unsupervised learning can be transferred to relatively small datasets which contain both labels and images. This transfer learning can produce a robust deep learning model even if the well-labeled data is relatively small (Fig. 3).57,58) The flexible application of unsupervised learning and transfer learning can be extended to semi-supervised learning. As aforementioned, a database clinically routinely obtained can be relatively easily obtained and a few data in the large unlabeled data can be labeled with the clinical outcome or diagnosis. In spite of a small labeled samples, various deep learning approaches employ unlabeled data to find discriminative representations for small labeled samples.59,60) For example, a study was aimed at prediction of FDG uptake estimated by PET using gene expression data for lung cancer, while a small number of subjects include both PET and gene expression data. By employing a larger gene expression dataset without PET data, a prediction model of FDG uptake can be developed.61) As many clinical data are placed on the situation of ‘large unlabeled data and small labeled data’, the deep learning model which can enhance performance through unsupervised learning and unlabeled data will be widely used in future molecular imaging and medical data research.
Another feasible way to overcome the labeling issue is to employ multiple unstructured data corresponding to imaging data. For example, clinical imaging data include text reports which included human interpretation results with natural languages. Even though these reports are mostly unstructured, they have a lot of information of image labels, including differential diagnosis, abnormal findings and disease locations. Data mining of the semantic interactions of medical images and texts will be a feasible approach to develop a deep learning model based on real-world clinical data.62) As self-supervised learning of imaging representations using a deep learning model for semantic context can be already used in natural image data, medical imaging data will be trained by representations of text reports.63) The learning of representations of the imaging data and finding their clinical significance can be a data-driven approach to develop biomarker without a priori knowledge. The self-supervised learning will be one of the future directions of a data-driven approach and will be achieved by using a text report or intrinsic information, such as age and gender matched with image data.
One of the overlooked practical issues is data harmonization. Molecular imaging routinely used in the clinical setting has various types. Numerous tracers can be used to obtain imaging data according to their clinical purposes. Furthermore, image acquisition protocols are varied according to the centers, which may reduce the accuracy of deep learning models when they aim at generalized application for multiple centers. Different imaging textures related to different detector types and image reconstruction algorithms can affect the performance of deep learning. Furthermore, the distribution of tracer has temporal dynamics, image acquisition at different time points may influence on the acquisition of deep learning-based biomarkers. Recently, deep learning has been used to analyze kinetics of dynamic imaging data,64) however, most imaging data routinely obtained in the clinic are static images, which require harmonization for multiple centers. The different tracers which aim at same molecular targets also cause a harmonization problem. For example, to obtain the information of brain amyloid deposits, several radiotracers are available, e.g., 11C-PIB, 18F-Florbetapir, 18F-Florbetaben, and 18F-Flutemetamol. These PET imaging show similar results though different quantification results.65,66) While classical amyloid quantification can be overcome by linear correction, deep learning models using heterogeneous image data with these different tracers are challenging.
In this review, current deep learning models developed for molecular imaging have been briefly introduced in terms of their purposes. As molecular imaging has information of molecular changes regarding pathophysiology, accurate and objective quantification is a critical step to use in the clinic. This quantitative information is linked to clinical decision and prediction of outcome as well as differential diagnosis. Thus, instead of simple diagnostic classification, we should focus on the discovery of biomarkers by extracting functional information of molecular imaging using deep learning. This information can contribute to theranostic approaches, which aim at the combination of diagnostics and therapeutics using same molecular targets. Deep learning models will summarize the status of patients with quantitative value. The models should be clinically validated under the clinical situation with unbiased data instead of limited datasets. Clinically validated molecular imaging-based biomarker can be used to monitor the disease status in terms of functional information. By predicting the outcome of the patient at the individual level using imaging data, therapeutic plans including dose and schedule as well as treatment methods can be personalized. To facilitate the clinically feasible deep learning models, it is promising to leverage unlabeled data and unsupervised learning. This approach will be used to considerably untangle the issues induced by supervised learning approaches which have been employed by most of deep learning models for imaging data. These issues included the heterogeneous data distribution, unseen data and uncertainty of decisions. Furthermore, unsupervised learning followed by transfer learning can develop various types of deep learning models with relatively small samples. Because of the distinctiveness of the medical field and the various purposes of molecular imaging, the development of a deep learning model that meets the particular clinical goals will be necessary, and the result will be an objective biomarker that plays an important role in objective clinical decision.
The author has nothing to disclose.
All relevant data are within the paper and its Supporting Information files.
Types of current deep learning applications for nuclear medicine and molecular imaging
Types of applications | Examples | References |
---|---|---|
Image-based diagnosis | Cancer staging (T- and N-staging) | 11,12 |
Diagnosis of Alzheimer’s disease using PET and/or MRI | 13–18 | |
Diagnosis of Parkinson’s disease using dopamine transporter imaging | 19–21 | |
Prediction of coronary heart disease | 22–24 | |
Enhancement of image reconstruction and image quality | Image reconstruction | 25–29 |
Attenuation correction | 30–34 | |
Recovery of low-dose PET images | 35–37 | |
Image-based quantification | Segmentation | 38–42 |
Image generation for quantification | 43,44 |
The output of deep learning model as a predictive biomarker. A deep convolutional neural network (CNN) model was developed to differentiate brain positron emission tomography of Alzheimer’s disease from healthy subjects. This model was applied to another cohort, mild cognitive impairment patients to predict future cognitive outcome. The output of the model represents a probability of Alzheimer’s disease, which can be used as a predictive biomarker for predicting cognitive outcome in preclinical disorders.
A gap between training and real-world data. Most of deep learning models are developed by patients’ data with specific disorders and controls. The problem of deep learning application to the clinic is the difference between real-world data and the training cohort. Real-world data in the clinic included heterogeneous patients different from training cohorts. Furthermore, the distribution of disease and normal is considerably different. This data distribution issue become a bigger factor when deep learning aims at general population.
Leveraging unlabeled data as a clinical routine for facilitating deep learning development. As labeling for medical data is too expensive and time-consuming, it is a bottleneck for developing deep learning models. Since it is relatively easy to collect heterogeneous image data obtained for clinical routine, unsupervised learning can leverage these unlabeled ‘dirty’ data. Unsupervised learning-based feature extraction can be transferred to relatively small cohorts which contain both labels and images to predict clinical outcome as well as differential diagnosis according to the clinical purposes.
Table 1 Types of current deep learning applications for nuclear medicine and molecular imaging
Types of applications | Examples | References |
---|---|---|
Image-based diagnosis | Cancer staging (T- and N-staging) | 11,12 |
Diagnosis of Alzheimer’s disease using PET and/or MRI | 13–18 | |
Diagnosis of Parkinson’s disease using dopamine transporter imaging | 19–21 | |
Prediction of coronary heart disease | 22–24 | |
Enhancement of image reconstruction and image quality | Image reconstruction | 25–29 |
Attenuation correction | 30–34 | |
Recovery of low-dose PET images | 35–37 | |
Image-based quantification | Segmentation | 38–42 |
Image generation for quantification | 43,44 |
PET, positron emission tomography; MRI, magnetic resonance imaging.
pISSN 2508-4445
eISSN 2508-4453
Formerly ISSN 1226-5829
Frequency: Quarterly