Home
Scholarly Works
Visual Prompting and Adaptation of Vision Language...
Conference

Visual Prompting and Adaptation of Vision Language Models for Tumor Classification in Breast Ultrasound

Abstract

Vision language models (VLMs) for medical image analysis constitute an active area of research. In this work, we systematically investigated the performance of four pre-trained VLMs, consisting of one non-medical (CLIP) and three recent medical VLMs (BiomedCLIP, MedCLIP, and PubMedCLIP) for tumor classification in six breast ultrasound datasets. The VLMs' performance was investigated under (a) zero-shot and few-shot test conditions through adaptation, (b) separate label sets including different text prompts, and (c) different prompting techniques involving text and visual markers. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were used as the evaluation measures. All models exhibited susceptibility to the description of the text prompts in zero-shot mode but improved with visual markers and adaptation. Our results established the utility of visual prompting and adaptation in improving the performance and reducing text prompt sensitivity. PubMedCLIP emerged as the most stable VLM for tumor classification in breast ultrasound.

Authors

Saha A; Mukherjee D

Volume

00

Pagination

pp. 1-5

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

April 17, 2025

DOI

10.1109/isbi60581.2025.10980993

Name of conference

2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)
View published work (Non-McMaster Users)

Contact the Experts team