Journal article
Reducing modal differences in zero-shot Anomaly detection based on vision-language generation model
Abstract
Zero-shot anomaly detection methods based on vision-language model rely on alignment between image and text. These methods ignore the inherent differences between different modalities, which is unfavorable for improving the alignment between modalities. This paper reduces modal differences between image and text by using guiding vision feature and text feature from the pre-trained vision-language generation model. The vision perception text …
Authors
Song Y; Shen W; Pan B; Wu Q; Gu D
Journal
Engineering Applications of Artificial Intelligence, Vol. 162, ,
Publisher
Elsevier
Publication Date
December 2025
DOI
10.1016/j.engappai.2025.112541
ISSN
0952-1976