FROM SURVEY-BASED TO REGISTER-BASED STATISTICS: A PARADIGM SHIFT USING LATENT VARIABLE MODELS
National Institutes involved in the production of official statistics have to face the growing need of timely, high quality, comprehensive and relevant estimates of population parameters of interest. While the use of administrative information has always been fundamental in official statistics to improve estimates derived from sample surveys, we are nowadays assisting to the so-called paradigm shift from survey-based to register-based statistics. This shift is assigning to administrative data a central role in the production of official statistics: they are not considered any longer as simple instrumental information, but, rather, are directly used for the estimation of population quantities. Within this framework, surveys are mainly meant to fill in the information that is not yet available in the administrative registers.
As it is clear, the integration of data coming from different data sources represents a key aspect of such a paradigm shift. Together with the potentiality of this emerging approach (cost reduction, quality improvement, deduced burden on respondents), a number of new methodological issues arises. These lead to the need (and to the opportunity) of developing novel, more flexible, statistical methods to use for the production of official statistics based on improved linking methods as well as on improved sampling and estimation methodologies. In this respect, three main research objectives represent the core of the present project: record linkage and data integration, statistical modelling for the estimation of Census and population quantities, small area estimation for the estimation of population parameters for unplanned domains.
The project focuses on the development of new, improved, record linkage (RL) methods for merging potentially noisy data sources, in the absence of a unique identifier; the aim is both to remove duplicated information and to increase the informative content of each single source.
The project also focuses on the development of new model-assisted and model-based projection estimators for the estimation of Census and population quantities. Models based on latent variables, both in the form of parametric – Gaussian – and non-parametric random parameters will be investigated to derive improved estimates from integrated, geo-referenced data, coming from multiple data sources. The last objective of the project focuses on the development of new small area estimation (SAE) methods for the estimation of population parameters from linked data. The research will focus also on developing (robust) small area models in a causal framework for policy evaluation. Novel methods will be proposed to estimate the area specific average treatment effects for unplanned domains. In this framework, latent variables may play a central role to model sources of unobserved heterogeneity. In this framework another goal of the project is to develop a methodology to estimate quantile regression in presence of endogeneity.