Computer Vision News - September 2023

29 datascEYEnce! Computer Vision News Now, let's delve into the implementation of the pipeline for frame generation, conditioned on surgical phase and tools! Yannik applied two main concepts: Denoising Diffusion Implicit Models (DDIMs) and Classifier-Free Guidance (CFG). DDIMs are an adaptation of Denoising Diffusion Probabilistic Models (DDPMs) which follow the underlying idea of gradually adding noise to an image as part of the forward diffusion process, and later reverting the process with a learned model as part of the reverse process. In more detail, first, Gaussian noise with a pre-defined variance schedule is iteratively added to an image, resulting in image xT. In the next step, the goal is to reverse-predict this process iteratively with a denoising UNet. This means the noise at each timestep t will be predicted by the model. After removing the predicted noise from an input image at a certain timestamp, we get xt-1, a slightly less noisy image than the previous one. Finally, after iteratively applying the denoising process, we receive a new synthetic frame. In this reverse process, DDIMs have an advantage over DDPMs: as their sampling strategy is more efficient, the number of inference steps needed can be reduced by roughly 80%. CFG is a technique whereby an additional model is used to influence the generation process. We can now

RkJQdWJsaXNoZXIy NTc3NzU=