Computer Vision News - September 2023

Computer Vision News 30 Deep Learning for the Eyes use the denoising UNet in both a conditionate state or a vanilla state. In the conditionate state, according to the CFG approach, phase and toolset embeddings from a separate linear model are used as additional input to the network and hence make it possible to generate an image in specific (rare) phases or tool combinations. How can we now qualitatively assess the synthesized image? Firstly, he used metrics such as Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) to compare the distribution between generated and real images which is a common approach in testing generated image results. In another experiment, Yannik utilized the Inception Score (IS) and F1 score to evaluate the conditionally generated images with a pre-trained model designed for multi-label, multi-class tool classification. The last experiment we will cover here, and probably the one with the most interesting outcome is the Downstream Tool Classification. In an ablation study, a classifier trained on only real images, only generated images, and a combined version made clear that critical phases have an increase of up to 10% in F1 score when training on both real and and synthetic data combined! In closing, Yannik shared some insights into the training procedure of the code available on GitHub: When training the denoising U-Net, you will see a strong decrease in the loss right at the beginning and close to no decrease for further training. While performing their experiments it became clear that the FID and KID scores do benefit from longer training.

RkJQdWJsaXNoZXIy NTc3NzU=