Humans are endowed with the ability to grasp the overall meaning or the gist of a complex visual scene at a glance. We need only a fraction of a second to decide if a scene is indoors, outdoors, on a busy street, or on a clear beach. In recent years, computational gist recognition or scene categorization has been actively pursued, given its numerous applications in image and video search, surveillance, and assistive navigation. Many visual descriptors have been developed to address the challenges in scene categorization, including the large number of semantic categories and the tremendous variations caused by imaging conditions. However, the existing methods for scene categorization still have difficulties to recognize images undergone geometric deformations, such as translation, scaling, shearing, rotation, and projection. A major goal of a visual system (natural or machine) is to recognize objects or scenes, regardless of their location or pose relative to the viewer. Furthermore, the geometric invariances are required not only for scene categorization, but also for many other computer vision applications, including handwritten digit recognition, texture recognition, face matching, and face recognition. Therefore, extracting geometric invariance is a key for efficient image recognition. This thesis investigates a geometric-invariant visual system to determine the categories of images. The proposed visual system achieves the geometric invariance through image normalization and feature extraction. A novel image approach to normalize affine deformations is presented in this thesis. The proposed approach produces normalized images by solving a constrained optimization problem based on image moments. An image normalization approach for projective deformations is also proposed. The image normalization methods allow geometric-invariant features to be extracted, thereby reducing the complexity of scene classifiers and the cost of classifier training. Visual descriptors used for scene categorization are reviewed in this thesis, from both methodological and experimental perspectives. Different visual descriptors are also combined to improve the scene categorization performance under geometric deformations.
History
Year
2016
Thesis type
Doctoral thesis
Faculty/School
School of Electrical, Computer and Telecommunications Engineering
Language
English
Disclaimer
Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.