Motion boundary emphasised optical flow method for human action recognition
© © The Institution of Engineering and Technology 2020 This study proposes a three-stream model using two different types of deep convolutional neural networks (CNNs): (i) a spatial stream with a CNN on images; (ii) a ResNet (residual network) on optical flows; and, (iii) a ResNet on the concatenation of motion features. This model is applied to four datasets: (i) UCF Sports; (ii) Youtube Sports; (iii) SBU action interaction; and (iv) a subset of the UCF-1M Sports. Using two optical flow estimation methods: (i) a motion boundary emphasised Epicflow (Edge Preserving Interpolation Correspondences for Optical Flow) method, (MBEpicflow); and (ii) the Flownet 2 method, a learning optical flow estimation method. It was found that (i) the proposed MBEpicflow outperforms the Flownet 2 method on the SBU dataset, while the Flownet 2 performs equally well or better than the MBEpicflow method on the other three datasets, and these results are the best when compared with those obtained using other approaches on all datasets evaluated. These results showed the importance of accurate optical flow plays in human action recognition, an aspect which has been seldom addressed. Moreover, it showed that if some measure of the global behaviours of motion is incorporated, the generalisation performance is often improved by 1-2%.