Degree Name

Doctor of Philosophy


School of Information Technology and Computer Science


The development in multimedia technology has brought the use of video documents to personal computers. The increased volume of multimedia data available in everyday lives has dramatically adopted these technologies for storing that multimedia data. Now these everyday live environments demand sophisticated systems for management and effective systems for the search and retrieval of multimedia data.

This thesis presents a semantic content-based video retrieval system. This work focuses on the semantic content of video documents and describes the implementation of a semantic-based video indexing and retrieval system suitable for the video-on-demand style applications.

This thesis addresses issues related to developing a model for describing the semantic content of a video document and representing information about this content. It develops a sophisticated semantic video model that expresses the underlying semantic structure of a video document and retrieves video clips among different levels of details. The proposed semantic model is an extension of the traditional conceptual model which will be applied to the video domain. The semantic video model describes how the metadata can be represented. The metadata contain information on the semantic video structure, the high-level semantics composition of elementary semantic units, and the video content indexing and storage. The proposed model divides a video document based on its semantic content into a structure of story, events, activities and objects with interrelationships in the various spaces in the video (time, space, context and structure).

Semantic content-based video retrieval demands human and machine understanding of video content. This thesis investigates and suggests a methodology suitable for integrating manual human understanding and automatic machine understanding technologies of video documents. A computer-aided semantic video analyzer, which utilizes the processing techniques for semantic video acquisition, is simulated.

This thesis proposes a video query language based on the first order logic for querying video information, and a design and an implementation for video retrieval. This language will provide operations for utilizing compositional data, description, and contextual, spatial and temporal relationships in the user's queries. This thesis also introduces a graphical conceptual model to describe the relations among semantic units constituting a composite unit which is a step toward an easy-to-grasp graphical user interface.

The results of this thesis lead to the conclusion that: • A video document has a rich internal semantic structure that can be formally expressed and used for semantic content-based video retrieval. • It is possible to construct a semantic based video indexing system and a computer-aided analyzer to assist in semantic video analysis and acquisition. • It is possible to retrieve video documents based on their semantic content.

The author considers this work a step toward making video documents searchable as text.