Computer Vision News - February 2019
14 Challenge: Medical Decathlon Computer Vision News Jorge Cardoso, Senior Lecturer at King’s College London, served in the organizing team with the Medical Segmentation Decathlon (MSD) challenge, sponsored by DeepMind, NVIDIA and RSIP Vision. The challenge tested the generalizability of machine learning algorithms when applied to 10 different semantic segmentation tasks with the aim of developing an algorithm or learning system that can solve each task, separately, without human interaction. Michela Antonelli told us about it before the start of the decathlon. The challenge concluded recently and Jorge Cardoso shares with us very interesting results. Challenge Jorge Cardoso: The challenge ended up happening during MICCAI. There were two phases. The first phase was seven tasks where people tried their algorithms when tasks were available. The second phase was three new tasks where the researchers had not seen the tasks before. During the event itself, what we did is a relatively long presentation which took almost two hours going through all the aspects of the challenge. That included what the datasets have and why the dataset is important in terms of how free it is from a licensing point- of-view that allows commercial companies to build on it… aspects related to the data itself. Everything that we’ve done is on the website. You have the code that was used to validate; a document explaining the methodology for the statistical analysis; and you have the results of that analysis. After all of that, what we did is we announced the winners of the challenge for phase 1 and phase 2. When we were reporting the results, we found that there were three large blocks of results. There was a very clear winner, which was surprising. I was not expecting a method to pretty much outperform all others on pretty much every metric. There was a group of two or three methods which formed a very good cluster of very well- performing methods, deserving honorable mentions. Then there was pretty much everyone else, with quite poor performances. This was on the training tasks, the seven tasks that were part of phase 1. When we did the full statistical analysis of the results over multiple tasks over multiple regions over multiple metrics using non-parametric ranking. We ended up concluding that there was one method that was better than the others. The issue is how well did the method extrapolate to the other three tasks. The three new tasks were a little
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=