Learning-Based Video Compression Framework With...

Learning-Based Video Compression Framework With Implicit Spatial Transform for Applications in the Internet of Things

Abstract

The rapid development of Big Data and network technology demands more secure and efficient video transmission for surveillance and video analysis applications. Classical video transmission relies on spatial-frequency transformation for compressing with loss but with limited coding efficiencies. The deep learning-based approach exceeds such limitations. In this work, we push the limit further by proposing an implicit spatial transform parameter method, which models the interframe redundancy to efficiently provide information for frame compression. Specifically, our method comprises a transform estimation module, which estimates the conversion from decoded frame to the current frame, and a context generator. The transform compensation and context generator produce a condensed high-dimensional context. Furthermore, we propose a P-frame CoDec for more efficient frame compression by removing the interframe redundancy. The proposed framework is extensible with a flexible context module. We demonstrate experimentally that our method outperforms previous methods by a large margin. Our method brings 34.817 more saved bit rate than H.265/HEVC. We also demonstrate 17.500 more bit rate saving and 0.490 dB gains in peak signal-to-noise ratio (PSNR) compared with the current state-of-the-art learning-based method proposed by Liu et al. (2022).