University of Wollongong
Browse

The missing information principle ‒ A paradigm for analysis of messy sample survey data

journal contribution
posted on 2024-11-17, 12:44 authored by Raymond L Chambers
Sample surveys, as a tool for policy development and evaluation and for scientific, social and economic research, have been employed for over a century. In that time, they have primarily served as tools for collecting data for enumerative purposes. Estimation of these characteristics has been typically based on weighting and repeated sampling, or design-based, inference. However, sample data have also been used for modelling the unobservable processes that gave rise to the finite population data. This type of use has been termed analytic, and often involves integrating the sample data with data from secondary sources. Alternative approaches to inference in these situations, drawing inspiration from mainstream statistical modelling, have been strongly promoted. The principal focus of these alternatives has been on allowing for informative sampling. Modern survey sampling, though, is more focussed on situations where the sample data are in fact part of a more complex set of data sources all carrying relevant information about the process of interest. When an efficient modelling method such as maximum likelihood is preferred, the issue becomes one of how it should be modified to account for both complex sampling designs and multiple data sources. Here application of the Missing Information Principle provides a clear way forward. In this paper I review how this principle has been applied to resolve so-called “messy” data analysis issues in sampling. I also discuss a scenario that is a consequence of the rapid growth in auxiliary data sources for survey data analysis. This is where sampled records from one accessible source or register are linked to records from another less accessible source, with values of the response variable of interest drawn from this second source, and where a key output is small area estimates for the response variable for domains defined on the first source.

History

Journal title

SURVEY METHODOLOGY

Volume

49

Issue

2

Pagination

219-256

Language

English

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC