Imputation of missing data in the income variables in the National Survey on Health and Aging in Mexico

Authors

  • Guillermo Andrés Villagra-Fuentes Autonomous University of Nuevo León image/svg+xml

DOI:

https://doi.org/10.29105/vtga11.3-1099

Keywords:

Multiple imputation, elderly individuals, income

Abstract

The presence of missing data, also known as Missing Values or missing data, is a common situation faced by both researchers and decision-makers. This study is no exception, as it is based on the National Survey on Health and Aging in Mexico (ENASEM), which is longitudinal and targeted at individuals over 50 years old, making the presence of missing values evident.

For this particular study, special attention has been given to missing values in three main areas: income, expenditure, and assets variables. The proposal involves employing the method of multiple imputations under the assumption of Missing at Random (MAR).

Out of a total of 28,892 missing variables, 100% of these were successfully imputed. It was observed that the highest concentration of missing values was found in the 2001 round, decreasing in subsequent rounds. Regarding the survey sections, it was found that the one with the highest percentage of missing values, which were imputed, was the assets section, with 67%, followed by the income section with 19%, and the expenditure section with 13%.

Downloads

Download data is not yet available.

References

Arbuckle, J. L., Marcoulides, G. A., & Schumacker, R. E. (1996). Full information estimation in the presence of incomplete data. Advanced structural equation modeling: Issues and techniques. Recuperado el 20 de Marzo de 2023, de: https://books.google.es/books?hl=es&lr=&id=VcHeAQAAQBAJ&oi=fnd&pg=PA243&dq=Arbuckle,+J.+L.+(1996).+Full+information+estimation+in+the+presence+of+incomplete+data.+In+G.+A.+Marcoulides+%26+R.+E.+Schumacker+(Eds.),+Advanced+structural+equation+modeling+(pp.+243-277).+Mahwah,+NJ:+Lawrence+Erlbaum&ots=HDHifc-F0b&sig=Fr54IfOT1Xbfj-Z6es4DmjU_zmc#v=onepage&q&f=false

Binder, D. A. y W. Sun (1996), Frequency valid multiple imputation for surveys with complex designs, Bussines Survey Methods Division, Statistics, Canada. Recuperado el 2 de Febrero del 2024 http://www.asasrms.org/Proceedings/papers/1996_044.pdf

Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.. Recuperado el 2 de Febrero de 2024, de: https://books.google.es/books?hl=es&lr=&id=rNt5CgAAQBAJ&oi=fnd&pg=PR7&dq=Box,+G.+E.+P.+y+G.+M.+Jenkins.&ots=DJ94uQj0SE&sig=LTAQonDW3LqpJ0zxrvHieglW79k#v=onepage&q=Box%2C%20G.%20E.%20P.%20y%20G.%20M.%20Jenkins.&f=false

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological). Recuperado el 15 de marzo del 2023: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1977.tb01600.x

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1-22. DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x

Encuesta Nacional de Salud y Envejecimiento de México (ENASEM) (2001,2003,2012 y 2015). Recuperado el 5 de Junio de 2023, de: https://enasem.org/Home/index_esp.aspx

Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, Recuperado el 20 de Marzo de 2023, de: https://journals.sagepub.com/doi/abs/10.1177/0013164401615001

Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM 2001, Wong y Espinoza (2004). ENASEM. Recuperado el 6 de Octubre de 2023, de: http://mhasweb.org/Resources/DOCUMENTS/2001/Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM_2001.pdf

Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM 2003, Wong y Espinoza (2004). ENASEM. Recuperado el 10 de Octubre de 2023, de: http://mhasweb.org/Resources/DOCUMENTS/2003/Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM_2003.pdf

Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM 2012, Wong y Espinoza (2014). ENASEM. Recuperado el 10 de Octubre de 2023, de: http://mhasweb.org/Resources/DOCUMENTS/2012/Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM_2012.pdf

Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM 2015, González, Obregon, Orozco, Wong, y Zhang, Espinoza (2017). ENASEM. Recuperado el 12 de Octubre de 2023, de: http://mhasweb.org/Resources/2DOCUMENTS/2015/Imputation_of_Non-Reponse_on_Economic_Variables_in_the_MHAS-ENASEM_2015.pdf

Little, R y Rubin, D. (1987). Statistical Analysis with Missing Data. Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc. New York.. Recuperado el 5 de Junio de 2023, de https://leseprobe.buch.de/images-adb/61/97/61976bf3-cfac-463d-bb88-ca1ddb674cdf.pdf

Raghunathan, T. , Lepkowski, J., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology, Recuperado el 12 de Octubre de 2023, de: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.405.4540&rep=rep1&type=pdf

Ritter, C., & Tanner, M. A. (1992). Facilitating the Gibbs sampler: the Gibbs stopper and the griddy-Gibbs sampler. Journal of the American Statistical Association. Recuperado el 2 de Noviembre de 2023, de: https://www.tandfonline.com/doi/abs/10.1080/01621459.1992.10475289 DOI: https://doi.org/10.2307/2290225

Roberts, G. O. (1992) “Convergence Diagnosis of the Gibbs Sampler”, in Bernardo, J. M.; J. O. Bergen; A. P. Dawid y A. F. M. Smith (eds.). Bayesian Statistics. Oxford University Press. Recuperado, el 1 de marzo de 2024, de: https://global.oup.com/academic/product/bayesian-statistics-4-9780198522669?lang=en&cc=gb DOI: https://doi.org/10.1093/oso/9780198522669.003.0054

Rubin, D. B. (1976). Inference and missing data. Biometrika. Recuperado el 5 de Diciembre de 2023, de: https://academic.oup.com/biomet/article-abstract/63/3/581/270932 DOI: https://doi.org/10.2307/2335739

Rubin, D. B. (1987), Multiple imputation for non-response in surveys. New York, Wiley, Recuperado el 2 de Junio de 2023, de: https://books.google.com.mx/books?hl=es&lr=&id=bQBtw6rx_mUC&oi=fnd&pg=PR24&dq=Rubin,+D.+B.+(1987),+Multiple+imputation+for+nonresponse+in+surveys.+New+York,+Wiley&ots=8OtF7N1-eQ&sig=TQB8x1prrdrUg3dd-XnDnPd4w_Q#v=onepage&q=Rubin%2C%20D.%20B.%20(1987)%2C%20Multiple%20imputation%20for%20nonresponse%20in%20surveys.%20New%20York%2C%20Wiley&f=false

Vargas, D., & Lorenz, F. (2015). Inference with Missing Data Using Latent Growth Curves. Revista del Instituto Interamericano de Estadística.

Vargas, Valdés (2018) Ajuste estadístico a la distribución del ingreso en el Módulo de Condiciones Socioeconómicas 2015 mediante imputaciones multiples. Recuperado, el 20 de Marzo de 2020, de: https://www.inegi.org.mx/rde/2018/08/27/ajuste-estadistico-a-la-distribucion-del-ingreso-en-modulo-condiciones-socioeconomicas-2015-mediante-imputaciones-multiples/

Published

2025-05-30

How to Cite

Villagra-Fuentes, G. A. (2025). Imputation of missing data in the income variables in the National Survey on Health and Aging in Mexico. Vinculategica Efan, 11(3), 141–161. https://doi.org/10.29105/vtga11.3-1099