Doctor of Philosophy
School of Electrical, Computer & Telecommunications Engineering - Faculty of Informatics
Asheibi, Ali Taher M, Discovery and pattern classification of large scale harmonic measurements using data mining, PhD thesis, School of Electrical, Computer & Telecommunications Engineering, University of Wollongong, 2009. http://ro.uow.edu.au/theses/558
Harmonic monitoring is an important issue for electricity utilities and their customers. Continuous monitoring of voltage and current are required to identify any substantial harmonic events before they occur. This monitoring results in large volumes of multivariate data. Although researchers have realised that such large amounts of power quality (PQ) data hold much more information than that reported using classical statistical techniques for PQ monitoring, few have taken the opportunity to exploit this additional information. This hidden information might be of assistance in the identification of critical issues for diagnoses of harmonic problems such as, predicting failures in advance and giving alarms prior to the onset of dangerous situations. Utility engineers are now seeking new tools in order to extract information that may otherwise remain hidden, especially within large volumes of data. Data mining tools are an obvious candidate for assisting in such analysis of large scale data. Data mining can be understood as a process that uses a variety of analytical tools to identify hidden patterns and relationships within data. Classification based on clustering is an important utilisation of unsupervised learning within data mining, in particular for finding and describing a variety of patterns and anomalies in multivariate data through various machine learning techniques and statistical methods. Clustering is often used to gain an initial insight into complex data and particularly in this case, to identify underlying classes within harmonic data. The main data mining methodology used in this work is that of mixture modelling based on the Minimum Message Length (MML) algorithm which essentially searches for a model which best describes the data using a metric of an encoded message. This method of unsupervised learning, or clustering, has been shown to be able to detect anomalies and identify useful patterns within the monitored harmonic data set. Anomaly detection and pattern recognition in harmonic data can provide engineers with a rapid, visually oriented method for evaluating the underlying operational information contained within the data set. A case study from power quality data upon which the MML method has been applied, was taken from a harmonic monitoring program installed in a typical 33/11kV MV zone substation in Australia that supplies ten 11kV radial feeders. Several patterns have been identified from using the MML technique on the harmonic data, such as significant high harmonic disturbances, footprints of the monitored sites, unusual harmonic events (capacitor switching, turn on televisions, air conditioners and the off peak hot water system) and detection of different abstractions (super-groups), each of which comprise similar clusters. The C5.0 supervised learning algorithm has been used to generate expressible and understandable rules which identify the essential features of each member cluster, and to further utilize these in predicting which ideal clusters any new observed data may best described by. One difficulty with the MML algorithm when used to derive various mixture models is the difficulty in establishing a suitable stopping criterion to secure the optimum number of (mixture) clusters during the clustering process. A novel technique has been developed to overcome this difficulty using the trend of the exponential of message length difference between consecutive mixture models. First, the proposed method has been tested using data from known number of clusters with randomly generated data points and also with data from a simulation of a power system. The results from these tests confirm the effectiveness of the proposed method in finding the optimum number of clusters. Second, the developed method has been applied to various two-weekly data sets from the harmonic monitoring program used on this thesis. The optimum number of clusters has been verified by the formation of supergroups using Multidimensional Scaling (MDS) and link analysis. Third, the method was benchmarked against a commonly used fitness function technique, which has underestimated the optimal number of cluster in the measured harmonic data. This resulted from the theoretical maximum entropy equation used in calculating the fitness function that assumes the attributes are independent which is not the case in the correlated nature of the harmonic attributes. Finally, generated rules from the C5.0 algorithm were used for classification and prediction of future events to determine which cluster any new data should belong to.
02Whole.pdf (2534 kB)