An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data



Publication Details

Namazi-Rad, M., Tanton, R., Steel, D., Mokhtarian, P. & Das, S. (2017). An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data. Computers, Environment and Urban Systems, 63 3-14.


The Population Census is an important source of statistical information in most countries that is capable of producing reliable estimates of population characteristics for small geographic areas. One limitation of a census is that there are many population characteristics that cannot be collected due to respondent burden or cost. This means that statistical agencies have to conduct population based surveys to provide social, economic and demographic characteristics for a target population which are not captured by a large-scale census. These surveys are usually capable of producing direct estimates at the national level and high level regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for comprehensive statistical information not only at the national level but also for sub-national domains, there is a wide discussion in the literature about the use of statistical techniques that combine survey with census data to provide more detailed, finer-level estimates.Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can be employed to link the records from survey and census data where exact matching of reporting units is impossible due to confidentiality restrictions. These techniques can then provide the detailed social, economic and demographic information required for small areas.An approach is developed in this paper in which a . close-to-reality synthetic population of individuals and households is generated from available census tables using an iterative proportional updating (IPU) method. Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to generate a district-specific synthetic population of individuals and households. Matching is then performed by imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to estimate the wealth index for each household within the synthetic population. The results show that using the method presented in this paper helps with achieving more representative estimates (comparing with direct survey estimates) particularly for areas with small sample sizes where not many population units with different socio-demographic characteristics are included.

Please refer to publisher version or contact your library.