Computer vision has seen significant advancements in language-based object detection in recent years. Unlike conventional object detection tasks with predefined label categories, language-based perception allows algorithms to understand and respond to a diverse range of textual descriptions associated with images. This progress is paving the way for a future where the limitations of fixed label spaces are replaced by an expansive, almost infinite, label space. However, with this exciting shift comes a pressing challenge: effectively evaluating the performance of these algorithms. “For language-based detection, that’s not as easy as you think,” Samuel points out. “We looked at benchmarks for referring expression datasets and open-vocabulary detection, and while there are great benchmarks for open-vocabulary detection, they’re mostly limited to evaluating how good you are at detecting a novel category not seen during training, but these category names are still simple categories, like a bottle or an iPad case.” Samuel Schulter is a Senior Researcher at NEC Laboratories America, working on computer vision. His recent focus is on the intersection between vision and language for 2D/3D perception tasks. In this paper, he proposes a new benchmark for languagebased object detection. He speaks to us ahead of his oral and poster this afternoon. OmniLabel: A Challenging Benchmark for Language-Based Object Detection 8 DAILY ICCV Thursday Oral Presentation
RkJQdWJsaXNoZXIy NTc3NzU=