VidAnomaly: LSTM-Autoencoder-Based Adversarial...

VidAnomaly: LSTM-Autoencoder-Based Adversarial Learning for One-Class Video Classification With Multiple Dynamic Images

Abstract

One-class video classification (anomalous video detection) serves an important role when abnormal videos are absent during training, poorly sampled or not well defined. However, one-class video classification is challenging. Due to the unavailability of abnormal samples, it is a cumbersome task to train an end-to-end deep supervised learning model. Meanwhile, video data representation is challenging because of the unstructured scheme of video contents. To represent video data with temporal and spatial information, we propose multiple dynamic images in our task because dynamic image encodes the temporal evolution of video frames and represents video contents at the level of the image pixels. Multiple dynamic images are viewed as the input sequence with temporal and spatial information and achieve dimension reduction of original video data. In this paper, we propose a LSTM-autoencoder-based adversarial learning model for one-class video classification (VidAnomaly without abnormal samples in the training stage. Our architecture is composed of three sub-networks. LSTM-autoencoder network (R) learns the temporal dependence of the input sequence and reconstructs the input sequence for the discriminator network (D) to achieve adversarial learning. The novelty of the proposed model is that we add an additional LSTM-encoder network (A) to obtain the latent representation of the reconstructed sequence. Minimizing the distance between the two latent representations from R and A benefits the model to further capture the training data distribution because it forces the LSTM-autoencoder network to yield an essential representation of training samples in latent space. In the inference stage, for a given abnormal sample as the input, the model poorly reconstructs the input abnormal sample and the reconstruction error would be high because the proposed model is trained merely on normal samples and its parameters are only suitable for reconstructing normal samples. Based on this, we detect abnormal samples. The experimental results show that VidAnomaly learns the target class distribution effectively and is superior to other methods.