Computer Vision News 10 Best Poster at VISMAC 2023 architecture and represents past knowledge processed by the network itself. With this augmentation we surpass the stateof-the-art approaches. The results of both these works demonstrate the effectiveness of augmented architectures in tackling visual and language tasks, shedding a light on potential future directions that could be done to improve both sides of the captioning task.
RkJQdWJsaXNoZXIy NTc3NzU=