4.3.1 Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with … 4.4.5 ReEdit: Multimodal Exemplar-Based Image Editing 3.32 TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm And visit also Hanoona’s poster (her oral presentation will be on Monday): 3.96 PALO: A Polyglot Large Multimodal Model for 5B People Hanoona’s picks of the day: Hanoona Abdul Rasheed is a PhD candidate in Computer Vision at MBZUAI. Her research focuses on integrating vision and language for multimodal learning, emphasizing data-centric approaches and model generalization. This means developing models that interact across multiple modalities—text, images, videos, and regions—enhancing interactivity, improving generalization across applications by enabling a single model to handle diverse tasks, and extending language coverage beyond commonly supported ones like English and Chinese to underrepresented languages. For today, Sunday 2 2 Hanoona’s Picks DAILY WACV Sunday Orals Posters
RkJQdWJsaXNoZXIy NTc3NzU=