Machine learning approach for multidimensional poverty estimation

Mario Esteban Ochoa Guaraca; Ricardo  Castro; Alexander Arias Pallaroso; Antonia Machado; Dolores Sucozhañay

doi:10.37815/rte.v33n2.853

Machine learning approach for multidimensional poverty estimation

PDF MHT

Published : 2021-11-26

DOI : https://doi.org/10.37815/rte.v33n2.853

Keywords :

random forest, social sciences, regression, region, limited dataset

Mario Ochoa

Ricardo Castro

Alexander Arias Pallaroso

Antonia Machado

Dolores Sucozhañay

Abstract

In the social sciences, a theoretical analysis has predominated in its research. The scarcity of data and its difficulty in collecting and storing it, has been the main limitation for the social sciences to adopt quantitative approaches. However, the large amount of information generated in recent years, mainly through the use of the Internet, has allowed the social sciences to include more and more quantitative analysis. This study proposes the use of technologies such as Machine Learning (ML) are the answers to solving this data scarcity. The objective is to estimate the multidimensional poverty index at the personal level in a particular territory of Ecuador by using Machine Learning (ML) regression models based on a limited amount of data for training. Ten ML models are compared, such as linear, regularized, and assembled models and Random Forest performs outstandingly against the other models. An error of 7.5% was obtained in the cross-validation and 7.48% with the test data set. The estimates are compared with statistical approximations of the MPI in a geographical area and it is obtained that the average MPI estimated by the model compared to the average reported by the statistical studies differs by 1%.

DOWNLOADS

Download data is not yet available.

How to Cite

Ochoa Guaraca, M. E., Castro, R. ., Arias Pallaroso, A., Machado, A., & Sucozhañay, D. (2021). Machine learning approach for multidimensional poverty estimation. Revista Tecnológica - ESPOL, 33(2), 205-225. https://doi.org/10.37815/rte.v33n2.853

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Alkire, S., & Foster, J. (2011). Counting and multidimensional poverty measurement. Journal of Public Economics, 95(7-8), 476–487. https://doi.org/ 10.1016/j.jpubeco.2010.11.006

Añazco, R. C., & Pérez, F. J. (2016). Medición de la Pobreza Multidimensional en Ecuador. Revista de Estadística y Metodología, 27–51.

Chen, N. C., Drouhard, M., Kocielnik, R., Suh, J., & Aragon, C. R. (2018). Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Transactions on Interactive Intelligent Systems, 8(2). https://doi.org/10.1145/3185515

Clausen, J., Vargas, S., & Barrantes, N. (2019). Do official multidimensional poverty measures in Latin America reflect the priorities of people living in poverty? Ensayos de Política Económica, 2(6), 15–34.

Consejo Nacional de Evaluación de la Política de Desarrollo Social. (2016). Metodología para la medición multidimensional de la pobreza en México. https://www.coneval.org.mx/Medicion/MP/Paginas/Metodologia.aspx

Denis, A., Gallegos, F., & Sanhueza, C. (2010). Medición de pobreza multidimensional en Chile. Santiago de Chile: Universidad Alberto Hurtado.

Devarajan, S. (2013). Africa’s Statistical Tragedy. Review of Income and Wealth, 59(SUPPL1), S9–S15. https://doi.org/10.1111/roiw.12013

Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.

Grimmer, J. (2015). We are all social scientists now: How big data, machine learning, and causal inference work together. PS, Political Science & Politics, 48(1), 80.

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine Learning for Social Science: An Agnostic Approach. Annual Review of Political Science, 24(1), 395–419. https://doi.org/10.1146/annurev-polisci-053119015921

Hand, D. J. (2007). Principles of data mining. Drug Safety, 30(7), 621–622. https://doi.org/10.2165/00002018-200730070-00010

Hindman, M. (2015). Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences. Annals of the American Academy of Political and Social Science, 659(1), 48–62. https://doi.org/10.1177/ 0002716215570279

Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894

Kambuya, P. (2020). Better Model Selection for Poverty Targeting through Machine Learning: A Case Study in Thailand. Thailand and The World Economy, 38(1), 91–116.

Khaefi, M. R., Hendrik, Burra, D. D., Dianco, R. F., Alkarisya, D. M. P., Muztahid, M. R., Zahara, A., Hodge, G., & Idzalika, R. (2019). Modelling Wealth from Call Detail Records and Survey Data with Machine Learning: Evidence from Papua New Guinea. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 2855–2864. https: //doi.org/10.1109/BigData47090.2019.9005519

King, G., Keohane, R. O., & Verba, S. (1995). The importance of research design in political science. American Political Science Review, 89(2), 475-481.

Korivi, K. (2016). Identifying poverty-driven need by augmenting census and community survey data. [master’s thesis, Kansas State University]. Institutional Repository UN. http://hdl.handle.net/2097/34556

Kshirsagar, V., Wieczorek, J., Ramanathan, S., & Wells, R. (2017). Household poverty classification in data-scarce environments: a machine learning approach. arXiv. http://arxiv.org/abs/1711.06813

Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., & Gutmann, M. (2009). Life in the network: the coming age of computational social science. Science (New York, NY), 323(5915), 721.

Lerman, K., Arora, M., Gallegos, L., Kumaraguru, P., & Garcia, D. (2016). Emotions, Demographics and Sociability in Twitter Interactions (tech. rep. No. 1). http://sentistrength.wlv.ac.uk/

Maldonado, C. E. (2019). Three reasons for social sciences metamorphosis in the 21st century. Cinta de Moebio, 64(64), 114–122. https://doi.org/10. 4067/S0717-554X2019000100114

Mcbride, L., & Nichols, A. (2015). Improved poverty targeting through machine learning: An application to the USAID Poverty Assessment Tools (tech. rep.).

Molina, M., & Garip, F. (2019). Machine Learning for Sociology. Annual Review of Sociology, 45, 27–45. https://doi.org/10.1146/annurev-soc-073117041106

Moreno, M. (2017). La medición de la pobreza. Revista Sociedad, (37).

Munguía, F. (2017). Medición multidimensional de la pobreza: El Salvador. In Villatoro, P. (Comp.), Indicadores no monetarios de pobreza: avances y desafíos para su medición. (págs.105-109). Comisión Económica para América Latina y El Caribe.

Otok, B. W., & Seftiana, D. (2014). The Classification of Poor Households in Jombang With Random Forest Classification And Regression Trees (RF-CART) Approach as the Solution In Achieving the 2015 Indonesian MDGs’ Targets (tech. rep.). www.ijsr.net

Piaggesi, S., Gauvin, L., Tizzoni, M., Adler, N., Verhulst, S., Young, A., Price, R., Ferres, L., Cattuto, C., & Panisson, A. (2019). Predicting City Poverty Using Satellite Imagery (tech. rep.). https://censusreporter.org/topics/ income/

Pokhriyal, N. (2019). Multi-View learning from disparate sources for poverty mapping. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 33(01), 9892–9893. https://doi.org/10.1609/aaai.v33i01.33019892

Rosales, V. Q., Leverone, M. B., Vargas, M. S., & Murillo, C. M. (2020). Multidimensional Poverty Index and its relationship with Ecuadorian public spending. Universidad y Sociedad, 12(2), 430–436.

Rovai, A. P., Baker, J. D., & Ponton, M. K. (2013). Social science research design and statistics: A practitioner’s guide to research methods and IBM SPSS. Watertree Press LLC.

Rudin, C. (February, 21, 2015). Can Machine Learning Be Useful for Social Science? http://citiespapers.ssrc.org/can-machine-learning-be-useful-for-socialscience/

Salazar, A., Cuervo, Y. D., & Pinzón, R. P. (2011). ´Índice de pobreza multidimensional para Colombia (IPM-Colombia) 1997-2010. Archivos de economía, 382.

Sampieri Hernández, R., Fernández Collado, C., & Baptista Lucio, P. (2014). Metodología de la investigación. México D.F.: Mc Graw Hill.

scikit-learn. (2021). sklearn.ensemble.RandomForestRegressor. https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

scikit-learn. (2021). sklearn.model selection.GridSearchCV. https://scikit-learn.org/stable/modules/generated/sklearn.model%5C selection.GridSearchCV. html

Social, O. (2015). Nueva metodología de medición de la pobreza por ingresos y multidimensional. Ministerio de Desarrollo Social.

Sohnesen, T. P., & Stender, N. (2017). Is ROom Forest a Superior Methodology for Predicting Poverty? An Empirical Assessment. Poverty & Public Policy, 9(1), 118–133. https://doi.org/10.1002/pop4.169

Talingdan, J. A. (2019). Performance comparison of different classification algorithms for household poverty classification. Proceedings - 2019 4th International Conference on Information Systems Engineering, ICISE 2019, 11–15. https://doi.org/10.1109/ICISE.2019.00010

Thoplan, R. (2014). Random Forests for Poverty Classification. International Journal of Sciences: Basic and Applied Research. 17(2).

Viscaino Caiche, L. (2019). Estimación de Índice de pobreza multidimensional a nivel provincial para Ecuador.[master’s thesis, Universidad de Cantabria, España]. Institutional Repository UN. http://hdl.handle.net/10902/18129

Wallach, H. (2016). Computational social science: discovery and prediction. Cambridge University Press.

Article Sidebar

References