Flagging of unacceptable segmentations: Monte Carlo dropout vs. Deep-Ensembles
ORAL
Abstract
Knowing when your deep learning model is producing inadequate segmentations is crucial. In this work, we leveraged the quantification of predictive uncertainty (PU) to flag unacceptable pectoral muscle segmentations in mammograms. Two methods were compared for the estimation of PU: Monte Carlo (MC) dropout and Deep-Ensembles (DE).
A modified UNet segmentation model was trained. In the MC method, dropout layers were added to the model. In the DE method, five variations of the model were trained. For both methods, the mean of five probability maps served as the final prediction, and PU was quantified as the sum of pixel-wise standard deviations. The potential of PU to flag unacceptable segmentations was tested on an independent set of 300 mammograms. For each mammogram, PU was calculated, and the segmentation quality was evaluated by a radiologist.
Both methods achieved comparable dice similarity coefficients (MC method: DSC=0.95±0.07, DE method: DSC=0.94±0.10). The AUC for flagging of unacceptable segmentations was higher for MC method (AUC=0.94, CI: [0.89, 0.98]) compared to the DE method (AUC=0.90, CI: [0.84, 0.95]).
This study indicates that the MC method is superior to DE when it comes to flagging unacceptable segmentations. This is important since DE are not always possible due to time constraints.
A modified UNet segmentation model was trained. In the MC method, dropout layers were added to the model. In the DE method, five variations of the model were trained. For both methods, the mean of five probability maps served as the final prediction, and PU was quantified as the sum of pixel-wise standard deviations. The potential of PU to flag unacceptable segmentations was tested on an independent set of 300 mammograms. For each mammogram, PU was calculated, and the segmentation quality was evaluated by a radiologist.
Both methods achieved comparable dice similarity coefficients (MC method: DSC=0.95±0.07, DE method: DSC=0.94±0.10). The AUC for flagging of unacceptable segmentations was higher for MC method (AUC=0.94, CI: [0.89, 0.98]) compared to the DE method (AUC=0.90, CI: [0.84, 0.95]).
This study indicates that the MC method is superior to DE when it comes to flagging unacceptable segmentations. This is important since DE are not always possible due to time constraints.
*Funding: ARRS P1-0389 and FWO G0A7121N.
–
Presenters
-
Zan Klanecek
- University of Ljubljana, Faculty of Mathematics and Physics