The Generalisation Ability of Machine-Learning Algorithms for Mapping Seagrass, Reef Habitat, and Live Coral
This thesis addresses the challenge of mapping seagrass and coral reef habitat using remotely sensed imagery. These maps have become essential components for marine science and management. Despite moderate to high accuracies achieved by currently utilised machine-learning algorithms, their lack of spatial generalisation capabilities limits their scalability. Generalisation in this context refers to the ability of a machine-learning algorithm to accurately classify target habitats at a new location where the algorithm has not been trained. This thesis aimed to develop and test a machine-learning algorithm capable of spatial generalisation in relation to seagrass and coral reef habitat mapping by answering the following questions: 1) Can a machine-learning algorithm spatially generalise across varying levels of marine habitat complexity starting with seagrass, then reef habitat, and finally live coral cover? 2) Do regularisation techniques such as dropout, L2 regularisation, or an increase in the number and diversity of training samples reduce overfitting? 3) What implications does spatial generalisation have on the scalability of marine habitat mapping?
A systematic literature review of machine-learning algorithms used for shallow coral reef habitat mapping identified the random forest algorithm and a deep learning algorithm known as U-Net, as two promising candidates for spatial generalisation. This thesis first compared their spatial generalisation capabilities in relation to mapping seagrass across three different sites in New Zealand. The results found that U-Net models were significantly better at spatial generalisation compared to random forest models. Next, these models were compared in relation to mapping reef habitat and live coral, with results finding again that U-Net models were significantly better at generalising. These results indicate U-Nets’ deep multi-layered hierarchical framework used for extracting complex feature representations is a critical component in relation to generalisation capability when it comes to mapping seagrass, reef habitat, and live coral. However, U-Nets’ ability to generalise was moderate in terms of accuracy, and strategies such as dropout and L2 regularisation were subsequently explored to mitigate overfitting. Results of these experiments found that although dropout and L2 regularisation did not improve generalisation capability, increasing the size and diversity of the training dataset can improve the spatial generalisation capabilities of U-Net models, although not in all cases.
In conclusion, this thesis determined U-Net to be significantly better at spatially generalising compared to random forest models and identified a way to improve the spatial generalisation capability of U-Net by increasing the number and diversity of training samples. By identifying a machine-learning algorithm that can generalise, this thesis provides the foundation for the use of U-Net in developing more scalable maps of seagrass and coral reef habitats.
History
Year
2024Thesis type
- Doctoral thesis