MICCAI 2021 Daily

Double-DIP 33 DAILY MICCAI Tuesday Potential of Transformers for 3D Medical Image Segmentation 2. TransUNet TransUNet is a 2D hybrid CNN-Transformer segmentation model that leverages a vision transformer (ViT) as a standalone layer into the encoder of UNet architecture. Specifically, TransUNet uses a CNN as a feature extractor to generate feature maps as input of the ViT model in the bottleneck of the architecture. The ViT model uses self-attention layers to effectively process the extracted feature maps that are fed into the decoder for computing the final segmentation output. TransUNet has achieved comparable performance on the tasks of multi-organ segmentation using BTCV dataset as well as Automated Cardiac Diagnosis Challenge (ACDC) for automated cardiac segmentation. Here is the paper explaining the architecture and the approach in further details, while the code and models are available here. CoTr: CoTr proposes a 3D framework that efficiently bridges CNNs with transformers for medical image segmentation. For this purpose, it introduces a deformable Transformer (DeTrans) to capture long-range dependencies in the extracted feature maps. The deformable self-attention mechanism in DeTrans allows for selectively paying more attention to a small set of key positions in extracted image embeddings. CoTr was tested and trained on BTCV multi-organ segmentation dataset and achieved competitive performance in this task. Here is the paper explaining the details of this approach and the code is available at https://github.com/YtongXie/CoTr Technical Differences: While all 3 approaches explore the potential application of using Transformer based networks for medical image segmentation. There are key differences between them. Unlike TransUNet which is a 2D segmentation Overview of TransUNet architecture and schematic of the transformer layer.

MICCAI 2021 Daily - Tuesday