Adaptive Importance Learning for Improving Lightweight Image Super-Resolution Network
Deep neural networks have achieved remarkable success in single image super-resolution (SISR). The computing and memory requirements of these methods have hindered their application to broad classes of real devices with limited computing power, however. One approach to this problem has been lightweight network architectures that balance the super-resolution performance and the computation burden. In this study, we revisit this problem from an orthogonal view, and propose a novel learning strategy to maximize the pixel-wise fitting ability of a given lightweight network architecture. Considering that the initial performance of the lightweight network is very limited, we present an adaptive importance learning scheme for SISR that trains the network with an easy-to-complex paradigm by dynamically updating the importance of image pixels on the basis of the training loss. Specifically, we formulate the network training and the importance learning into a joint optimization problem. With a carefully designed importance penalty function, the importance of individual pixels can be gradually increased through solving a convex optimization problem. The training process thus begins with pixels that are easy to reconstruct, and gradually proceeds to more complex pixels as fitting improves. Furthermore, the proposed learning scheme is able to seamlessly assimilate knowledge from a more powerful teacher network in the form of importance initialization, thus obtaining better initial performance for the network. Through learning the network parameters, and updating pixel importance, the proposed learning scheme enables smaller, lightweight, networks to achieve better performance than has previously been possible. Extensive experiments on four benchmark datasets demonstrate the potential benefits of the proposed learning strategy in lightweight SISR network enhancement. In some cases, our learned network with only 25 % of the parameters and computational complexity can produce comparable or even better results than the corresponding full-parameter network.