Year

1998

Degree Name

Doctor of Philosophy

Department

School of Information Technology and Computer Science

Abstract

One of important characteristics of heterogeneous distributed multidatabase systems is autonomy of underlying component database systems. Heterogeneity is also the result of autonomy. Owing to autonomy, optimisation of query processing in the systems demands the detection of a number of optimisation factors and the application of different techniques from the traditional ones.

This thesis considers optimisation of query processing in heterogeneous distributed multidatabase systems. In the systems, a query is decomposed into a collection of subqueries. After translation to the query language of component databases, subqueries are submitted to the component databases. Partial results, obtained from the processing of subqueries, are converted to a canonical data model format and then they are integrated.

Efficiency of query processing in the systems depends on the detection of the impacts of partial results on each other. The impacts are applied to the efficient submission of subqueries to the component databases and to the efficient integration of the partial results.

Initially, we introduce and adopt a generalised database model for query processing to be independent of any particular database model. In the database, queries are represented by algebraic expressions. Based on the model, we develop a technique to label expression trees corresponding to the database expressions for discovering optimisation of query processing.

We address labelling as a general technique in detecting optimisation of query processing that is not only applicable to heterogeneous distributed multidatabase systems but also to centralised and to homogeneous distributed database systems.

We address query optimisation in heterogeneous distributed multidatabase systems at two stages. One stage is engaged with the submission order of subqueries to the component databases by applying the impacts of partial results on each other, query processing costs in the component databases, data transmission costs, data conversion and data integration costs. At this stage, we present an algorithm to identify the submission order of subqueries for minimisation of the response time.

Another stage is concerned with the efficient integration of partial results. We take advantage of delays in obtaining partial results, while some of them are available at the submission site, and the others are not arrived yet. We address gradual integration of available partial results in the delays. Furthermore, by using impacts of available partial results on each other, their size can be reduced or they can be prepared to speed up future computations.

We propose algorithms for efficient integration of partial results. We consider two different environments for optimisation of query postprocessing. In a static environment, response times of subqueries are predictable. Based on that assumption, we present an algorithm to take the most advantage of the delays to reduce the size of available partial results for minimisation of the response time.

In a dynamic environment, response times of subqueries are not reliable. We develop another algorithm in the dynamic environment to reduce the size of available partial results as much as possible in order to minimise the response time.

As a particular case of the generalised database model, we employ relational algebra operations and expressions.

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.