Computer Vision News - December 2020

3 Jianbo Jiao 31 data, the correlations between video and audio are not so dense and strong. The speech audio usually has a very sparse correlation with the corresponding video. In our case, when the sonographer or doctor is scanning the ultrasound data, what appears on the screen is not necessarily related to what the doctor is talking about.” To solve this, the method introduces cross-modal contrastive learning to encourage the positive pair to live closer and the negative pair further away in the embedding space. The team found there was a lot of background noise and uncorrelated conversations in the audio data, so further proposed an affinity- aware self-paced learning scheme to detect these unrelated signals and adaptively learn the representations accordingly. Currently, this work only considers the audio and the video data, but the aim Best of MICCAI 2020

RkJQdWJsaXNoZXIy NTc3NzU=