Year

2012

Degree Name

Doctor of Philosophy

Department

Faculty of Education

Abstract

Writing tests are increasingly being included in large-scale assessment programs and highstakes decisions. However, Automated Essay Scoring (AES) systems developed to overcome issues of marker inconsistency, volume, speed, cost and so on, also raise issues of score validity. In order to fill a crucial gap identified in the current approaches used to evaluate AES systems, this study develops and applies a framework that draws upon the current theory of validation, for assessing the validity of scores produced from Automated Essay Scoring systems (AES) in a systematic and comprehensive manner.

This thesis provides rationales for, and details of, the five essential components of the proposed AES validation framework. These five components are: 1) the writing traits scored by an AES system, how well they are assessed, and how they relate to the ability being assessed; 2) the validity implications of the type of scoring procedure used by an AES system to derive an overall score; 3) the internal structure of the assessment scores produced by an AES system; 4) the measurement qualities of the scores produced by an AES system; and 5) the consequential aspect of validity evidence. In order to make a convincing argument for AES score validity, evidence must be collected for each component, and the bodies of evidence collected must be evaluated together, in terms of their combined effects on the meaning of the score and the implications of score use.

In order to demonstrate how this framework may be applied, it is used to investigate the validity of scores produced by a particular Automated Essay Scoring system – the Intelligent Essay Assessor (IEA) for the writing tasks from the Pearson Test of English (PTE) Academic. Five experienced human markers are employed to provide credible alternate measures, as a means to facilitate the examination of IEA scores.

This study demonstrates that the proposed framework is both effective in directing validation efforts, as well as in ensuring a methodical approach to AES validation. Through the application of this framework, the study has collected a wide range of empirical evidence and theoretical rationale, which enables a validity argument to be made for the IEA. Based on evidence collected, a number of recommendations are made with a view to further strengthen the validity of scores produced by the IEA. In addition, the study has illustrated in detail how various theories, including those associated with writing domain and measurement, can be used in conjunction with statistical methods, to collect and investigate evidence that is pertinent to different components of the AES validation framework.

The AES framework proposed in this study can be adapted and applied to the validation of all types of scoring systems. Furthermore, the validation processes undertaken in this study (i.e., first articulating an interpretative argument and then evaluating this argument in a particular test context), are generalisable to validations of all direct performance assessments.

Findings from this study support the position that the validation of AES systems needs to focus on direct evidence linking the scoring method to the intended interpretation and use of the scores. Evidence from this study also calls for careful rethinking of the role of human judgements of essay quality in the evaluation and further development of AES systems.

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.