Training CADe algorithms with synthetic datasets: augmenting clinical data for improved lung nodule detection
Synthetic datasets hold the potential to serve as cost-effective alternatives to clinical data, potentially aiding in mitigating the biases in clinical data. This paper presents a novel method that utilizes such datasets to train a computer-aided detection (CADe) algorithm. Our proposed approach uses images of a physical anthropomorphic phantom into which manufactured objects representing simplified lesions were inserted, followed by a set of randomized and parametrized augmentations of the data to increase the variability in these datasets. By incorporating these augmentations into the training phase, our proposed method aims to add variability within training datasets of limited size to improve model performance. We apply our proposed method to the false positive reduction stage of a lung nodule CADe system on computed tomography (CT) scans. Our experimental results demonstrate the effectiveness of the proposed method, where the network performance in terms of the Competitive Performance Metric (CPM) increased by 6% when a training set consisting of 50 clinical CT scans was augmented by scans obtained from a physical phantom database.
Read more here: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12927/129272S/Training-CADe-algorithms-with-synthetic-datasets–augmenting-clinical-data/10.1117/12.3006997.short