Video Object Detection With Two-Path Convolutional LSTM Pyramid
Video Object Detection With Two-Path Convolutional LSTM Pyramid
Blog Article
One of the major challenges in video object detection is drastic scale changes of click here objects due to camera motion.In this paper, we propose a two-path Convolutional Long Short-Term Memory (convLSTM) pyramid network designed to extract and convey multi-scale temporal contextual information in order to handle object scale changes efficiently.The proposed two-path convLSTM pyramid consists of a stack of multi-input convLSTM modules.It is updated in top-down and bottom-up pathways so that the temporal contextual information for small-to-large and large-to-small scale changes is exploited.The proposed multi-input convLSTM module uses two input feature maps of different resolutions to store and exchange temporal contextual information of different scales between neighboring convLSTM modules.
The outputs of the proposed convLSTM pyramid network constitute a feature pyramid where each feature map contains multi-scale temporal contextual information from earlier frames.The proposed convLSTM pyramid can be combined with various still-image echofix spring reverb object detectors to improve the performance of video object detection.Experimental results on ImageNet VID dataset show that the proposed method achieves state-of-the-art performance and can handle scale changes efficiently in video object detection.