Degree Name

Doctor of Philosophy


School of Mechanical, Materials, Mechatronic and Biomedical Engineering


In the big data era, huge amounts of data are being collected as a result of the day-to-day operation of organisations. While this increased availability of data brings potential value to organisations and society it also brings challenges. The usefulness or quality of this data and accounting for that quality in data driven decision making is a concern. Relevant problems associated with the quality of data include missing data, incorrect data, and the inclusion of outliers. The specific nature of these problems impacts on how the data is improved and its quality dealt with. Available methods for addressing these problems focus on missing data in sample surveys for smaller datasets. Large datasets require advanced methods for imputation when mining data. Many of the datasets available for monitoring asset condition, and for asset decision analysis, contain missing values. Inappropriate treatment of missing data may cause large errors in the classification of data patterns and inaccurate or false results and trend predictions. One outcome of these errors can be an equipment failure or catastrophic accidents.

A hybrid deep learning approach has been developed to impute missing asset condition-monitoring data and provide trend analysis. This technique uses a two-stage non-parametric approach with a convolutional neural network (CNN) as the first stage to estimate the underlying feature maps for each set of training data; the second stage utilises the long short-term memory (LSTM) algorithm to impute the missing information. Missing data imputation is achieved by using the underlying feature maps to train the second deep learning LSTM network in a minimum error strategy. This approach improves the imputation accuracy without requiring an increase in the size of the training data sets. Algorithms are utilised as part of this approach to improve data extraction and separability. Data extraction from condition monitoring databases for the purposes of hybrid deep learning have been reviewed. The importance of preserving the data and transforming it to feature maps has been canvassed to see whether data separability can be improved with an intelligent tool for data extraction. It has been shown that the simple method of missing data imputation using only one type of machine learning will not meet imputation accuracy criteria.

The development of machine learning networks has been explored in detail to enhance the automatic imputation capability of condition monitoring data without increasing the cost. Previous approaches using neural networks have also been examined in detail to provide a baseline for using the hybrid deep learning network approach. The research finding has led to the development of a hyperparameter process to help select and tune the model to meet criteria needed for predicting trends accurately.

The developed hybrid deep learning approach was applied to railway condition monitoring real datasets for missing data imputation. The results have been verified with missing data imputation accuracy of 90 percent with the datasets having maximum 20 percent of missing data in the dataset. The developed algorithms can be applied to testing other similar datasets.

The major contribution of this work is the finding that for condition monitoring data sets the hybrid deep learning network approach is better than traditional imputation algorithms. Guidelines for using the hybrid deep learning network have been established to assist further research into the imputation of data sets using neural network approaches: this hybrid deep learning approach enables data sets to be imputed without overly relying on a priori information. Without a detailed analysis of the data, the choice of hyperparameters may only be obtained from tests of the datasets using deep learning approaches. Identifying the types and causes of missing data helps to treat different aspects of missing data and improve the analysis of datasets where missing data may be a key issue.

FoR codes (2008)




Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.