Computer Vision News 38 Deep Learning for the Eyes reduction during training. Nowit’s time for the analysis of the trustworthiness of an algorithm. But how can we define “trustworthiness”? Cristina identified different properties that a trustworthy algorithm should have, including reliability, explainability, and robustness. Hence, we are not meant to only evaluate performance with a Dice score or a ROC-AUC curve, but also to perform observer studies, provide meaningful explainability, and ensure robustness against potential adversarial attacks. Let’s get into more detail. Benchmarking for the research community and healthcare providers Probably the most widely spread approach to compare an algorithm with existing literature is by using public datasets and common computer vision metrics. However, this comes along with the limitation of potentially being irrelevant: public datasets may not be representative of a specific clinical setting and certain populations, and metrics that are common for the computer vision community may not be intuitive for clinicians and patients. Cristina highlights the importance of establishing observer studies to overcome these limitations. Observer studies allow us to compare clinicians with algorithms in the same setting as well as to analyse inter-reader variability. A potential demographic bias shall also be considered by including specific population groups in the studies. In the context of automated screening of diabetic retinopathy and age-related macular degeneration, Cristina was able to define fair and realistic expectations for commercially available algorithm performance, ensuring they are aligned with what is currently achievable by humans (GonzálezGonzalo et al., 2019). Visual evidence augmentation for clinicians and patients Visual attribution methods are widely adopted in classification tasks in medical imaging, however, Cristina demonstrated that basic heatmaps are not enough to provide meaningful algorithm explainability. They were accompanied by a lot of questions from the clinicians and turned out to be counterproductive and misleading in some cases. Cristina proposed a method for visual evidence augmentation, combining visual attribution and selective inpainting to iteratively uncover abnormalities. Her method allowed her to leverage the “knowledge” contained in an algorithm and generate more exhaustive explanations (González-Gonzalo, 2020). Adversarial attacks and robustness analysis Another important aspect concerning the trustworthiness of an algorithm is its robustness against malicious attacks, including adversarial attacks.
RkJQdWJsaXNoZXIy NTc3NzU=