Moment-based density estimation of confidential micro-data: a computational statistics approach
Statistics and Computing
Providing access to synthetic micro-data in place of confidential data to protect the privacy of participants is common practice. For the synthetic data to be useful for analysis, it is necessary that the density function of the synthetic data closely approximate the confidential data. Hence, accurately estimating the density function based on sample micro-data is important. Existing kernel-based, copula-based, and machine learning methods of joint density estimation may not be viable. Applying the multivariate moments’ problem to sample-based density estimation has long been considered impractical due to the computational complexity and intractability of optimal parameter selection of the density estimate when the true joint density function is unknown. This paper introduces a generalised form of the sample moment-based density estimate, which can be used to estimate joint density functions when only the information of empirical moments is available. We demonstrate optimal parametrisation of the moment-based density estimate based solely on sample data by employing a computational strategy for parameter selection. We compare the performance of the moment-based estimate to that of existing non-parametric and parametric density estimation methods. The results show that using empirical moments can provide a reasonable, robust non-parametric approximation of a joint density function that is comparable to existing non-parametric methods. We provide an example of synthetic data generation from the moment-based density estimate and show that the resulting synthetic data provides a reasonable disclosure-protected alternative for public release.
Open Access Status
This publication may be available as open access