Year

1995

Degree Name

Doctor of Philosophy

Department

Department of Computer Science

Abstract

This thesis investigates methods by which the prior knowledge that is encoded in groups or ensembles of trained artificial neural networks can be used to assist in the learning of new tasks. Standard methods for training neural networks use only the observed (a-posteriori) data, ignoring any other prior (a-priori) knowledge which may be available for a particular task. One form of this prior knowledge is the representations that have been learnt by other networks trained on similar problems. The use of such knowledge can improve both training times and generalisation by appropriately biasing the representational ability of the neural network for the new learning task.

Previous research on the transfer of knowledge between neural networks has concentrated largely on its direct or literal transfer from a single source network to a single target network. It is shown in this thesis that the knowledge encoded by multiple neural networks trained within the same problem "environment" can be used in preference to single source transfer to improve the biasing of the search space. This is known as ensemble transfer.

This neural network prior knowledge can be likened to points on a map of the representation space. Each trained neural network in the prior knowledge defines the location of a neural network solution that is appropriate for the learner's environment. The goal of an ensemble transfer algorithm is to efficiently use the prior knowledge from this map to bias the learning of the internal representation for any new tasks. An appropriate form for the storage of the prior knowledge that allows reliable optimisation is thus essential. This aspect is considered in detail with reference to transformation invariance in the network representations and symmetric regions in the representation space.

An ensemble optimised transfer algorithm is developed based on a compact version of the solution space without network symmetries. It is then tested in three different problem domains: character recognition; spoken digit identification; and image approximation. Advantages both in training speed and stability are demonstrated in all of these situations over an algorithm which uses literal transfer from the lowest error network in the prior knowledge (ensemble literal transfer).

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.