Doctor of Philosophy
School of Electrical, Computer and Telecommunications Engineering
Adistambha, Kevin, Searching and describing human motion, Doctor of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2013. https://ro.uow.edu.au/theses/3981
The amount of media being uploaded to the Internet is growing at an incredible rate. As an illustration, approximately 75 hours of video are uploaded to Youtube each minute, where approximately 30% of the videos contain human motion such as sport or music video. Consequently, new techniques and methods to search and describe contents related to human motion are sorely needed, since current search techniques mainly depend on user-supplied tags, which are often ambiguous and subjective when those tags are used to describe human motion. For example, a video containing “John Doe running and jumping into a lake” can be tagged as “John Doe”, “lake”, “running and jumping”, “funny video”, etc.
Being able to search for a specific motion has many applications. For example, searching for a specific movement in a sport in order to improve a person’s sporting performance by comparing to that of a professional athlete’s using automatically extracted movement features (such as a famous golfer’s swing, a famous tennis player’s forehand, etc.). This scenario will be possible if a method to objectively describe human motion existed. Searching human motion would be as natural as recording a motion and using it as yet another search term without having to think about the subjectivity of user-supplied tags and how someone else would “describe” that motion.
To achieve this, three things are required: a new multimedia communication format (since currently popular search techniques predominantly use simple text terms), a new human motion description language (since an objective and consistent method to describe human motion is also required), and feature extraction and matching technique for human motion search applications.
To communicate advanced multimedia queries, Multimedia Query Format (MQF) is presented in this thesis. MQF is a communication format for a structured multimedia search that goes beyond current text-based search currently in popular use. Instead of restricting itself to one particular multimedia description format, MQF was designed to allow the use of any number of current or future description standards, with advanced features for search such as logical operators, query-by-example, extensibility, and simplicity. MQF is also shown to work well with Fragment Request Unit (FRU) and Fragment Update Unit (FUU), which are MPEG standards that enable selective synchronization of two XML documents over a network. Using FRU and FUU, MQF is shown to be able to perform “Query Streaming”, which is a continuously updatable multimedia query method that is suitable for use in mobile devices with limited resources. The work performed in MQF was also proposed to MPEG during the MPEG-7 Query Format standardization effort, where concepts introduced by MQF were contributed to the discussions, refinements, and validations during the MPEG standardization process.
To describe human motion objectively and accurately, Human Motion Markup Language (HMML) is presented in this thesis. HMML is a human motion description language that was designed to be able to describe human motion in three dimensions (sagittal, coronal, and transverse planes) to facilitate human motion centric search. Another design goal of HMML is to enable human motion search by utilizing MQF as the communication format, where HMML can be used in conjunction with existing multimedia description standard such as MPEG-7 and Dublin Core to provide a more complete description of a desired media not currently possible today. Key features of HMML includes human readability, simplicity, and searchability.
To extract this objective human motion description, a method to automatically extract HMML motion description from 3D motion capture data is also presented. This method involves “partial reconstruction” of the human body, i.e., each of the major limb such as the arms and the legs are reconstructed from 3D data independently. By not reconstructing the body as a whole, each limb becomes a separate entity that can be described independent of other limbs in an objective manner. Consequently, applications searching for a walking motion with the leg movements to serve as the query term will also match walking and waving, walking and dribbling, etc., providing a fine-grained method for motion search. Experiments were performed to determine the consistency of the extracted symbol sequences using walking, running, and sneaking motions, where it was found that the extracted symbols are consistent even when the symbols were extracted from people of varying height and movement patterns. Also, the optimal motion duration and detail level of the extracted symbol sequence were investigated to utilize the symbol sequences in a motion matching application.