Home
Scholarly Works
Personalizing Vision-Language Models With Hybrid...
Journal article

Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection

Abstract

Zero-shot anomaly detection (ZSAD) aims to develop a foundational model capable of detecting anomalies across arbitrary categories without relying on reference images. However, since "abnormality" is inherently defined in relation to "normality" within specific categories, detecting anomalies without reference images describing the corresponding normal context remains a significant challenge. As an alternative to reference images, this study explores the use of widely available product standards to characterize normal contexts and potential abnormal states. Specifically, this study introduces AnomalyVLM, which leverages generalized pretrained vision-language models (VLMs) to interpret these standards and detect anomalies. Given the current limitations of VLMs in comprehending complex textual information, AnomalyVLM generates hybrid prompts-comprising prompts for abnormal regions, symbolic rules, and region numbers-from the standards to facilitate more effective understanding. These hybrid prompts are incorporated into various stages of the anomaly detection process within the selected VLMs, including an anomaly region generator and an anomaly region refiner. By utilizing hybrid prompts, VLMs are personalized as anomaly detectors for specific categories, offering users flexibility and control in detecting anomalies across novel categories without the need for training data. Experimental results on four public industrial anomaly detection datasets, as well as a practical automotive part inspection task, highlight the superior performance and enhanced generalization capability of AnomalyVLM, especially in texture categories. An online demo of AnomalyVLM is available at https://github.com/caoyunkang/Segment-Any-Anomaly.

Authors

Cao Y; Xu X; Cheng Y; Sun C; Du Z; Gao L; Shen W

Journal

IEEE Transactions on Cybernetics, Vol. 55, No. 4, pp. 1917–1929

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

January 1, 2025

DOI

10.1109/tcyb.2025.3536165

ISSN

2168-2267

Contact the Experts team