Empirical Bayes estimation of undercount in the decennial census
On April 1, 1990, the decennial census for the United States will be conducted by the U.S. Bureau of the Census. By December 31, 1990, the Census Bureau is specified by law to submit state population counts for the purpose of reapportionment of the U.S. House of Representatives, and by March 31, 1991, to submit small-area population counts for the purpose of redistricting. Census counts are used in a variety of other ways: for revenue-sharing formulas between different levels of government, for demographic projections, as a base for morbidity and mortality statistics, and so forth. Inaccurate census counts should be cause for concern for the whole nation. It is universally acknowledged that certain groups of people (e.g., young black males, illegal aliens, etc.) are harder to count than others. If the hard-to-count groups are distributed in equal proportions throughout the United States, there would be far less controversy over what to do about the uncounted people. As it is, many large American cities such as Chicago, Detroit, New York, and Los Angeles feel they are losing federal funds because their cities contain larger numbers of the groups that are less well counted. And certain states such as New York and California feel they are underrepresented in Congress, to the benefit of Midwestern states such as Indiana and Iowa. Census undercount is defined simply as the difference between the true count and the census count, expressed as a percentage of the true count. Small-area estimation of this undercount is considered here, using empirical Bayes methods based on a new and, it is argued, more realistic model than has been used before. Grouping of like subareas from areas such as states, counties, and so on into strata is a useful way of reducing the variance of undercount estimators. By modeling the subareas within a stratum to have a common mean and variances inversely proportional to their census counts, and by taking into account sampling of the areas (e.g., by dual-system estimation), empirical Bayes estimators that compromise between the (weighted) stratum average and the sample value can be constructed. The amount of compromise is shown to depend on the relative importance of stratum variance to sampling variance. These estimators are evaluated at the state level (51 states, including Washington, D.C.) and stratified on race/ethnicity (3 strata) using data from the 1980 postenumeration survey (PEP 3-8, for the noninstitutional population).