Computer Vision News - October‏ 2023

The internet hosts an immense reservoir of videos, witnessing a constant influx of thousands of uploads to platforms like YouTube every second. These videos represent a valuable repository of multimodal information, providing an invaluable resource for understanding audio-visual-text relationships. Moreover, understanding the content in long videos (~2 hours), is an open problem. In her PhD thesis, Medhini investigates the intricate interplay between diverse modalities—audio, visual, and textual— in videos and harnesses their potential for comprehending semantic nuances within long videos. Her research explores diverse strategies for combining information from these modalities, leading to significant advancements in video summarization and instructional video analysis. Computer Vision News 24 Congrats, Doctor Medhini! Medhini Narasimhan recently obtained her PhD in Computer Science from UC Berkeley under the supervision of Trevor Darrell. Medhini’s research focuses on learning multimodal representations for long videos using little to no supervision, by modeling correlations across the different modalities. Specific applications of her work include creating short visual summaries of long YouTube videos, synthesizing longer videos from short clips, and parsing semantics of instructional videos. She is currently a Research Scientist at Google Labs with Steve Seitz, continuing her research on video understanding, while also developing innovative products. Congrats, Doctor Medhini!

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=