Computer Vision News - September 2024

3 Computer Vision News Computer Vision News M2CURL - Sample-Efficient Multimodal Reinforcement Learning … To address this, the team focused on a key task: making sense of this information and turning it into a form the learning algorithm can understand. This process involves learning representations, like patterns in the data, that are much more digestible for the algorithm. By distilling the raw data into meaningful patterns, the algorithm can learn more efficiently, needing fewer samples to achieve a higher success rate. While the concept of this research is not entirely new, what distinguishes it from other works is the innovative way in which it pays attention to each modality during the learning process. “What has been done before is that researchers learned their representation for one modality – for vision, for example – then learned the representation for another modality, and then they just added them together and gave it to the algorithm,” Fotios explains. “In that case, this learning part doesn’t pay attention to what the other modality is doing, and it’s quite important for the algorithm to know how these two modalities look alike in this latent representation space.” Another innovative aspect of this research is the use of selfsupervised representation learning. In scenarios where labeled data is scarce and expensive, selfsupervised learning allows the algorithm to learn from vast amounts of unlabeled data. The team combined this with their multimodal learning approach, using both an intra-modal loss (where you pay attention when learning from the same modality) and inter-modal loss (between different modalities). This dual attention approach enabled the algorithm to learn better representations that pay attention to both modalities simultaneously, leading to an improved understanding of the environment

RkJQdWJsaXNoZXIy NTc3NzU=