Video-based Human Detection and Character Analysis

Scale-aware Progressive Optimization Network

Intro: Crowd counting has attracted increasing attention due to its wide application prospect. One of the most essential challenge in this domain is large scale variation, which impacts the accuracy of density estimation. To this end, we propose a scale-aware progressive optimization network (SPO-Net) for crowd counting, which trains a scale adaptive network to achieve high-quality density map estimation and overcome the variable scale dilemma in highly congested scenes. Concretely, the first phase of SPO-Net, band-pass stage, mainly concentrates on preprocessesing the input image and fusing both high-level semantic information and low-level spatial information from separated multi-layer features. And the second phase of SPO-Net, rolling guidance stage, aims to learn a scale-adapted network from multi-scale features as well as rolling training manner. For better learning local correlation of multi-size regions and reducing redundant calculations, we introduce different supervisions with analogy objective in each rolling, refer to as progressive optimization strategy. Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF) not only demonstrate the efficacy of each part in SPO-Net, but also suggest the superiority of our proposed method compared with the state-of-the-art approaches.

ACM MultiMedia (ACM MM), 2020
[Paper]

Self-Bootstrapping Pedestrian Detection in Downward-Viewing Fisheye Cameras Using Pseudo-Labeling

Intro: Downward-viewing fisheye cameras have attracted much attention in surveillance systems due to the wide coverage and less occlusion. However, pedestrian detection in downward-viewing fisheye cameras remains an open problem due to a lack of large-scale labeled dataset as existing datasets are usually based on oblique-viewing perspective cameras. Furthermore, it's time-consuming to label a downward-viewing fisheye dataset manually. To address this, we propose a self-bootstrapping pedestrian detection method, which automatically pseudo-labels downward-viewing fisheye images by making full use of spatial and temporal consistency of pedestrians in the cameras to promote the accuracy of pedestrian detection. We segment the downward-viewing fisheye images into two regions and propose the pseudo-labeling methods for them progressively: a cyclic fine-tuned detector for the oblique region and a visual tracking method for the vertical region. Combining the pseudo-labels from two regions, we fine-tune the detection network for better accuracy. Experimental results show that the proposed approach reduces time consumption by about 95% compared with labor-intensive manual labeling while it still reaches competitive and comparable Average Precision (AP).

International Conference on Multimedia & Expo (ICME), 2020
[Paper]

Scale-Aware Rolling Fusion Network for Crowd Counting

Intro: Due to wide application prospects and various challenges such as large scale variation, inter-occlusion between crowd people and background noise, crowd counting is receiving increasing attention. In this paper, we propose a scale-aware rolling fusion network (SRF-Net) for crowd counting, which focuses on dealing with scale variation in highly congested noisy scenes. SRF-Net is a two-stage architecture that consists of a band-pass stage and a rolling guidance stage. Compared with the existing methods, SRF-Net achieves better results in retaining appropriate multi-level features and capturing multi-scale features, thus improving the quality of density estimation maps in crowded scenarios with large scale variation. We evaluate our method on three popular crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), and extensive experiments show its outperform over the state-of-the-art approaches.

International Conference on Multimedia & Expo (ICME), 2020
[Paper]

ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding



Computer Vision and Pattern Recognition (CVPR), 2019
[Paper]

Weak-structure-aware visual object tracking with bottom-up and top-down context exploration



Signal Processing: Image Communication (SPIC), 2018
[Paper]

Hierarchical Ensemble of Background Models for PTZ-based Video Surveillance



IEEE Transactions on Cybernetics (TCYB), 2015
[Paper]

Back to top