The work introduces a technique inspired by a 2006 paper, Hybrid Images, which proposes merging one image’s high-frequency components with another’s lowfrequency components to create a new image. This new image exhibits different frequencies when viewed from varying distances, aligning with the differences in how CNNs and humans perceive information. For the first part of his novel method, Mehmet performs asimple hybrid image augmentation. The model is trained on a batch of images, and some are randomly picked to have their high and lowfrequency components mixed up. This straightforward process diversifies the training data, introducing variations that help the model generalize better but only require a few lines of code, which are readily available online. “The second part goes deeper into frequency analysis literature,” he explains. “The Fourier transform decomposes a signal into an amplitude and a phase component. Amplitude is essentially the magnitude of frequency components in that signal. Phase shows the phase. We find that humans focus more on the phase information, which is important because if we remove the amplitude, we can pretty much guess what the image is. It turns out that CNNs overfit to amplitude information just like they overfit to high frequency. We merge these two techniques: doing the hybrid images and taking the phase information.” 14 DAILY ICCV Wednesday Poster Presentation
RkJQdWJsaXNoZXIy NTc3NzU=