25 Computer Vision News Gihyun Kim For an extended period, Convolutional Neural Networks (CNNs) have been the dominant architecture in computer vision. While CNNs show a powerful performance in the classification task, they are well known to be vulnerable to adversarial attacks which make CNNs misclassify by adding an extremely small (imperceptible) perturbation to an input image. The field of investigating adversarial attacks has emerged since the vulnerability issue is critical in applying CNNs to security-sensitive applications in real world and also in understanding of the operating mechanism of the model better. With Vision Transformers (ViTs) emerging as new promising architectures, one question arises “How vulnerable are Transformers compared to CNNs against adversarial attacks?”. While a recent series of research argued with this question, they do not reach consistent conclusions. In one group of studies, they claimed that ViTs are more robust against gradient-based attacks, and it is attributed to CNNs relying on highfrequency information, while ViTs are not. Another group of studies argues that ViTs are as vulnerable or more to attacks as CNNs in specific conditions, such as a training setup. While the two conflicting groups compare adversarial robustness between The attacked image, the difference between the original and perturbed image, and an enlarged area of the perturbed image are shown in each case. BEST OF WACV 2024
RkJQdWJsaXNoZXIy NTc3NzU=