AttTrack: Online Deep Attention Transfer for Multi-object Tracking
Abstract
Multi-object tracking (MOT) is a vital component of intelligent video
analytics applications such as surveillance and autonomous driving. The time
and storage complexity required to execute deep learning models for visual
object tracking hinder their adoption on embedded devices with limited
computing power. In this paper, we aim to accelerate MOT by transferring the
knowledge from high-level features of a complex network (teacher) to a
lightweight network (student) at both training and inference times. The
proposed AttTrack framework has three key components: 1) cross-model feature
learning to align intermediate representations from the teacher and student
models, 2) interleaving the execution of the two models at inference time, and
3) incorporating the updated predictions from the teacher model as prior
knowledge to assist the student model. Experiments on pedestrian tracking tasks
are conducted on the MOT17 and MOT15 datasets using two different object
detection backbones YOLOv5 and DLA34 show that AttTrack can significantly
improve student model tracking performance while sacrificing only minor
degradation of tracking speed.