The information in aggregate data
Ecological inference attempts to draw conclusions concerning individual-level relationships using data in the form of aggregates for groups in the population. The groups are often geographically defined. A fundamental statistical issue is how much information aggregate data contain concerning the relationships and parameters that we are trying to estimate. The information affects the standard errors of estimates as well as the power of any tests of hypothesis. It also affects the ability to tell, from the aggregate data, which different models under consideration are supported by the data. In this chapter likelihood-based methods are considered. We show in general how aggregation affects the information matrix associated with the maximum likelihood estimates compared with the case when individual-level data are available. Hypothesis testing using aggregate data is also considered. We apply this general approach to ecological inference in the case of several 2 by 2 tables and show how the information is affected by aggregation. Tests of the hypothesis that the parameters are constant across the groups are developed using aggregate data. We also consider how the addition of a small number of individual-level data obtained from a sample, ignoring the groups, increases the information concerning the parameters. The theory is illustrated through an example.