Background Key challenges in benchmarking health service achievement of policy goals in areas such as chronic disease are: 1) developing indicators and understanding how policy goals might work as indicators of service performance; 2) developing methods for economically collecting and reporting stakeholder perceptions; 3) combining and sharing data about the performance of organizations; 4) interpreting outcome measures; 5) obtaining actionable benchmarking information. This study aimed to explore how a new Boolean-based small-N method from the social sciences—Qualitative Comparative Analysis or QCA—could contribute to meeting these internationally shared challenges. Methods A ‘multi-value QCA’ (MVQCA) analysis was conducted of data from 24 senior staff at 17 randomly selected services for chronic disease, who provided perceptions of 1) whether government health services were improving their achievement of a set of statewide policy goals for chronic disease and 2) the efficacy of state health office actions in influencing this improvement. The analysis produced summaries of configurations of perceived service improvements. Results Most respondents observed improvements in most areas but uniformly good improvements across services were not perceived as happening (regardless of whether respondents identified a state health office contribution to that improvement). The sentinel policy goal of using evidence to develop service practice was not achieved at all in four services and appears to be reliant on other kinds of service improvements happening. Conclusions The QCA method suggested theoretically plausible findings and an approach that with further development could help meet the five benchmarking challenges. In particular, it suggests that achievement of one policy goal may be reliant on achievement of another goal in complex ways that the literature has not yet fully accommodated but which could help prioritize policy goals. The weaknesses of QCA can be found wherever traditional big-N statistical methods are needed and possible, and in its more complex and therefore difficult to empirically validate findings. It should be considered a potentially valuable adjunct method for benchmarking complex health policy goals such as those for chronic disease.