Beating Random Test Case Prioritization
Existing test case prioritization (TCP) techniques have limitations when applied to real-world projects, because these techniques require certain information to be made available before they can be applied. For example, the family of input-based TCP techniques are based on test case values or test script strings; other techniques use test coverage, test history, program structure, or requirements information. Existing techniques also cannot guarantee to always be more effective than random prioritization (RP) that does not have any precondition. As a result, RP remains the most applicable and most fundamental TCP technique. This article proposes an extremely simple, effective, and efficient way to prioritize test cases through the introduction of a dispersity metric. Our technique is as applicable as RP. We conduct empirical studies using 43 different versions of 15 real-world projects. Empirical results show that our technique is more effective than RP. Our algorithm has a linear computational complexity and, therefore, provides a practical solution to the problem of prioritizing very large test suites (such as those containing hundreds of thousands, or millions, of test cases), where the execution time of conventional nonlinear prioritization algorithms can be prohibitive. Our technique also provides a practical solution to TCP when neither input-based nor execution-based techniques are applicable due to lack of information.