How to compare image watermarking algorithms
In recent years, comparing image watermarking algorithms using the results of experiments with a handful of images has become a common practice. The images are randomly selected from a well chosen database and so it can be argued that they cover the important aspects of visual information such as texture or smoothness. However, the number of images used in the experiment remains a crucial factor in providing confidence in the results of experiment. By choosing this number to be ‘large’ one hopes that the comparison results are reliable. In this paper, our aim is to provide a systematic method of determining this number using a statistical approach based on hypothesis testing and power analysis.
We consider two algorithms and seek to verify the claim that one has superior performance with respect to some property. We start with finding a measure that represents the ‘goodness’ of the property of interest, follow with determining a suitable hypothesis test and the number of images and, finally, with interpreting the results of the experiment. We show how the use of a statistical framework can allow us not only to draw conclusions based on a sound methodology but also to provide a confidence level for these conclusions. We give a concrete application of the approach to the problem of comparing two spread spectrum watermarking algorithms with respect to their robustness to JPEG2000 compression. We discuss the intricacies of the choices at various stages of this approach, and their effects on the results.