Doctor of Philosophy (PhD)
School of Information Technology and Comuter Science - Faculty of Informatics
Lai, Bing-Chang, Combining generic programming with vector processing for machine vision, PhD thesis, School of Information Technology and Comuter Science, University of Wollongong, 2005. http://ro.uow.edu.au/theses/326
This thesis addresses the integration of generic programming with vector processing for machine vision. While generic libraries have been shown to provide near optimal performance without sacrificing flexibility and adaptability, current generic libraries do no utilise the vector processing unit (VPU), nor can they be vectorised directly. Generic vectorised libraries require a mechanism for expressing vectorised algorithms independently of the VPU. This is a problem since different VPUs can have different instructions and different limitations; programs written to use one vector technology are not portable to other vector technologies. Lastly, most existing machine-vision libraries do not provide image capture from sequence grabbers; the programmer has to use another library to capture images, and to supply additional code to enable the two libraries to work together. To allow vectorised, machine-vision algorithms to be portable across different vector technologies, this thesis proposes the use of an abstract VPU. The abstract VPU represents a set of real VPUs with a virtual VPU that has an idealized instruction set and constraints common to the real VPUs being represented. An abstract VPU, named Virtual Vector Machine (VVM), was developed to support generic programming. Different methods of implementing VMM were evaluated against hand-coded AltiVec (a vector technology found in PowerPC G4 and G5 processors) and scalar programs. The implementation chosen has no significant overheads when processing VVM vectors with a single AltiVec vector or a single scalar when compiled using Apple GCC 3.1 20021003. VVM vectors with a single AltiVec vector or scalar cover all byte AltiVec vectors in AltiVec mode and all types in scalar mode. When processing VVM vectors that use more than one AltiVec vector, the VVM implementation chosen is within 24% slower than a hand-coded program. Vectorised algorithms are difficult to implement, because they handle VPU-specific issues such as memory alignments, edges and prefetching. Thus, to reduce the number of algorithms required, a categorization of image processing operations based on input-to-output correlation is proposed. This categorization maps easily to generic programming and provides implementation hints. The categorization scheme separates image processing operations into three categories, which this thesis refers to as a quantitative, transformative and convolutive operations. Quantitative operations require one input element to produce zero or more output elements. Transformative operations require one input element to produce one output element. Convolutive operations produce a single output element from a rectangle of input elements. Each category requires only one general algorithm. Variations of the algorithm to handle differing input and output set combinations are also required. The generic, vectorised, machine-vision library, Vectorised Vision (VVIS), developed in this thesis uses the abstract VPU (VVM) and the three categories to provide cross-platform, vectorised algorithms. Because the division of duties used by existing generic libraries is unsuitable for vectorisation, two new divisions of duties are proposed and their performance is evaluated. Generic, vectorised algorithms for each category were evaluated against hand-coded programs and the speedup gained by using VVIS in AltiVec mode instead of scalar mode was collated. The VVIS implementation is comparable to hand-coded AltiVec and scalar programs when operating on single-channel byte images when processing quantitative and transformative operations. For convolutive operations, the final VVIS implementation is twice as slow in AltiVec mode when processing single-channel byte images, because the hand-coded AltiVec program did not need to support variable kernel sizes. In scalar mode, the final VVIS convolutive algorithm is comparable to hand-coded scalar programs. VVIS was slower than hand-coded programs when processing non-byte images, because of overheads introduced by VVM. For the transformative operations tested, VVIS, when executed in AltiVec moded instead of scalar mode, provides at most a four fold speedup depending on the operation when processing single-channel byte images. For convolutive operations tested, VVIS provides a speedup of approximately 1.5 times when processing single-channel byte images, despite using signed short internally; the VVM implementation used had noticeable overheads when operating on signed shorts in AltiVec mode. Results show that a generic, vectorised, machine-vision library generally have comparable performance to hand-coded programs when processing single-channel byte images. Because of overheads introduced by the VMM implementation, VVIS does not have comparable performance for all image types. Most image processing operations however use either single-channel byte or multi-channel byte images.