Degree Name

Doctor of Philosophy


School of Computer Science and Software Engineering


In recent years, human detection from images and videos has been one of the active research topics in computer vision and machine learning due to many potential applications including image and video content management, video surveillance and driving assistance systems. This thesis addresses a number of challenges in detecting humans under realistic conditions. In particular, novel and effective methods are proposed to deal with the problems of viewpoint and posture variations and partial occlusion.

Firstly, a robust human descriptor which is able to describe the human objects in various postures and viewpoints is proposed. The descriptor integrates multiple cues including shape, appearance and motion information. An improved template matching method is developed for extracting the shape information.

Secondly, to enable the human descriptor to be robust against illumination changes, a new textural feature, namely non-redundant local binary pattern (NR-LBP), is introduced. The NR-LBP is a variant of the well-known local binary pattern (LBP), but it has better discriminative power and is insensitive to the relative changes of intensities. The NR-LBP descriptor is used to encode both the local appearance and motion information of human objects. A generalised version of the LBP, referred to as Local Intensity Distribution (LID) descriptor, is also proposed in the thesis. Compared with the LBP, the LID descriptor is more compact while maintaining illumination invariance properties.

Finally, to deal with partial occlusion problem, a new statistical inter-object occlusion reasoning algorithm is proposed. Specifically, we model hypothesized human objects with their spatial relationships in a Bayesian network and infer inter-occlusion status of human objects using the variational mean field method.