MICCAI 2021 Daily – Wednesday

That’s achieved by minimizing the Kullback-Leibler (KL) divergence between the corresponding conditional distributions, including a prior distribution, which conditions the latent variable on the visual representation, and a variational posterior distribution which conditions the latent variable on the language representations. “ When we train these models, we should minimize the KL divergence between the two distributions, ” Ivona explains. “ In this case, I have the image on one side and the text on the other. Minimizing the KL divergence between these two multimodal distributions is very difficult. I tried many techniques, including cyclical annealing, a weighting term that controls this KL divergence. This took a lot of time, but it was important because when we align these distributions by minimizing the KL divergence, it means that we have learned the high-level patterns of the data in the latent space. That’s where the topic of the sentence later originates from. ” The dilemma here is how to represent the visual features that are conditioned in the prior and the language features that are conditioned in the posterior. Ivona used Transformer encoders . “ The self-attention mechanism in Transformer encoders is very important because I use that to learn holistic representations of the data , ” she tells us. 5 DAILY MICCAI Wednesday Ivona Najdenkoska

RkJQdWJsaXNoZXIy NTc3NzU=