Endoscopy Classification Model Using Swin Transformer and Saliency Map
Abstract
Endoscopy is a valuable tool for the early diagnosis of colon cancer.
However, it requires the expertise of endoscopists and is a time-consuming
process. In this work, we propose a new multi-label classification method,
which considers two aspects of learning approaches (local and global views) for
endoscopic image classification. The model consists of a Swin transformer
branch and a modified VGG16 model as a CNN branch. To help the learning process
of the CNN branch, the model employs saliency maps and endoscopy images and
concatenates them. The results demonstrate that this method performed well for
endoscopic medical images by utilizing local and global features of the images.
Furthermore, quantitative evaluations prove the proposed method's superiority
over state-of-the-art works.
Authors
Sobhaninia Z; Abharian N; Karimi N; Shirani S; Samavi S