ICCV Daily 2023 - Wednesday

“One thing we thought about a lot during this work was how we can evaluate procedures for systematic error identification in a quantitative way,” he says. “Often, other works just showcase examples of systematic errors, but in the end, you want to understand how reliable these procedures are and how often they find systematic errors.” He found a controlled way to inject systematic errors into zero-shot classifiers based on CLIP and then check if the procedure could find them. Then, he could quantitatively assess how good the procedure is and tune its hyperparameters. “Before we started, we didn’t know which systematic errors we’d find in the models because these were typically strong models, close to state of the art, which performed very well on a validation set,” Jan recalls. “People said they were close to human performance. Then we identified some systematic errors, which were obviously wrong to a human, but the system would make the same error over and over again!” One specific example was a rear view of an orange minivan in a snowy scene that was often misclassified as a snowplow, despite looking nothing like one. Regarding future possibilities for this work, Jan is optimistic that stronger text-to-image models will be released and that further domains could be explored. “Overall, we hypothesize that these models get better over time,” he tells us. “Stable Diffusion v1.5 was state of the art back then. Even if these models have shortcomings now, our approach will automatically benefit from progress in text-to-image models in the future.” To learn more about Jan’s work, visit his poster this afternoon at 14:30-16:30. 24 DAILY ICCV Wednesday Poster Presentation PromptAttack identifies the subgroup “old male African person with long hair” as systematic error of a MixerB/16, which misclassifies 25% of the corresponding samples as apes (not as humans). A Mixer-L/16 classifies the same samples with 97% accuracy.

RkJQdWJsaXNoZXIy NTc3NzU=