Harnessing Predictive Modelling for Education Index : A Dual Approach with Random Forest and Multiple Linear Regression

Authors

  • Olivia Kristianti Kusuma
  • Jessica
  • Grace Felicia Christy Widjaya
  • Helena Margaretha
  • Ferry Vincenttius Ferdinand
  • Kie Van Ivanky Saputra (SINTA: 115831, SCOPUS: 57361882600, ORCID: 0000-0002-8280-3266), Universitas Pelita Harapan, Tangerang, BANTEN, INDONESIA 0000-0002-8280-3266

DOI:

https://doi.org/10.19166/fastjst.v10i1.10367

Keywords:

Education Index, Random Forest, Multiple Linear Regression, Socioeconomic

Abstract

This study explores the predictive modeling of the Education Index (EI) using a dual approach using Random Forest and Multiple Linear Regression (MLR). The data, obtained from "Our World in Data" spanning 1990–2022, integrates socio-economic and infrastructure indicators, including GDP per capita, government spending on education, and access to electricity. This study includes 20 countries that are categorized by income level: Low-Income (Vietnam, Nepal, Myanmar, Pakistan, Zimbabwe), Lower-Middle-Income (Ghana, Bolivia, Cambodia, Egypt, Bangladesh), Upper-Middle-Income (Argentina, Brazil, Peru, Russia, Mexico) and High-Income (Germany, Italy, Portugal, Iceland, Greece). The analysis reveals that Random Forest outperforms MLR in terms of accuracy and lower error rates, while MLR provides better interpretability of variable relationships. With R2 of 99.34% by Random Forest Regression and 94% by Multiple Linear Regression (MLR). Key findings reveal that GDP per capita, primary and secondary completion rates, and internet usage significantly influence EI, underscoring the importance of economic conditions and infrastructure for educational outcomes. This study contributes to the field by offering comparative insights into machine learning and traditional statistical methods for educational analytics, providing a robust basis for policy development to enhance global education standards.

Author Biography

Kie Van Ivanky Saputra, (SINTA: 115831, SCOPUS: 57361882600, ORCID: 0000-0002-8280-3266), Universitas Pelita Harapan, Tangerang, BANTEN, INDONESIA

References

[1] S. Sukidin, W. Hartanto, R. N. Sedyati and S. Shofiyah, “Role of Education concerning the Gross Domestic Product. Human Development Index and Poverty Rate in East Java,” AL-ISHLAH: Jurnal Pendidikan, vol. 15 no. 3, pp. 4140–4149, 2023. https://doi.org/10.35445/alishlah.v15i3.1716

[2] A. Hovhannisyan, R. Castillo-Ponce and R. Valdez, “The determinants of income inequality: The role of education,” Scientific Annals of Economics and Business, vol. 66, no. 4, pp. 451–464, 2019. https://doi.org/10.47743/saeb-2019-0040

[3] Local Burden of Disease Educational Attainment Collaborators, “Mapping disparities in education across low- and middle-income countries,” Nature, vol. 577, no. 7789, pp. 235–238, 2020. https://doi.org/10.1038/s41586-019-1872-1

[4] J. S. Jamal, M. Salam, A. N. Tenriawaru, D. Rukmana, M. H. Jamil and S. Saadah, “Determinant factors affecting the improvement of education index,” Jurnal Penelitian dan Evaluasi Pendidikan, vol. 25, no. 1, pp. 88–96, 2021. https://doi.org/10.21831/pep.v25i1.40160

[5] F. Riandari, H. T. Sihotang and H. Husain, “Forecasting the Number of Students in Multiple Linear Regressions,” MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 249–256, 2022. https://doi.org/10.30812/matrik.v21i2.1348

[6] H. S. Alim, N. Rohmah and M. Milawati, “Study of education leverage factors to improve sampang human development index,” Cendikia: Media Jurnal Ilmiah Pendidikan, vol. 14, no. 3, pp. 366-374, 2024. https://doi.org/10.35335/cendikia.v14i4.4624

[7] O. Adeleke and P.E. McSharry, “Female enrollment, child mortality and corruption are good predictors of a country’s UN Education Index,” International Journal of Educational Development, vol. 90, pp. 102561, 2022. https://doi.org/10.1016/j.ijedudev.2022.102561

[8] G. Chairunisa, M. K. Najib, S. Nurdiati, S. F. Sanjaya, W. R. D. Andriani and D. Ekaputri, “Life Expectancy Prediction Using Decision Tree, Random Forest, Gradient Boosting, and XGBoost Regressions,” Jurnal Sintak, vol. 2, no. 2, pp. 71-82, 2024. https://doi.org/10.62375/jsintak.v2i2.249

[9] A. Primajaya and B. N. Sari, “Random forest algorithm for prediction of precipitation,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 1, no. 1, pp. 27–31, 2018 http://dx.doi.org/10.24014/ijaidm.v1i1.4562

[10] O. Dewi, G. E. Laukon, S. A. Sutresno and H. J. Christanto, “Modification of random forest method to predict student graduation data,” Jurnal Mantik, vol. 7, no. 4, pp. 2949–2961, 2024. https://doi.org/10.35335/mantik.v7i4.4528

[11] S. N. Wahyuni, “Implementation of Multiple Linear Regression for Predicting Time Series Data in Infectious Diseases Using a Machine Learning Approach,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 11, no. 2, 2024. https://doi.org/10.35957/jatisi.v11i2.7878

[12] K. Spoon, J. Beemer, J. C. Whitmer, J. Fan, J. P. Frazee, J. Stronach, A.J. Bohonak and R. A. Levine, “Random Forests for Evaluating Pedagogy and Informing Personalized Learning,” Journal of Educational Data Mining, vol. 8, no. 2, pp. 20–50, 2016. https://doi.org/10.5281/zenodo.3554595

[13] J. Raymaekers and P. J. Rousseeuw, “Transforming variables to central normality,” Machine Learning, vol. 113, no. 8, pp. 4953–4975, 2024. https://doi.org/10.1007/s10994-021-05960-5

[14] S. Wijaya and Fauziah, “Analysis of the comparison between linear regression, random forest, and logistic regression methods in predicting Crude Palm Oil (CPO) price,” Brilliance: Jurnal Riset dan Konseptual, vol. 3, no. 2, pp. 343–350, 2023. https://doi.org/10.47709/brilliance.v3i2.3334

[15] S. Obata, C. J. Cieszewski, R. C. Lowe III and P. Bettinger, “Random Forest regression model for estimation of the growing stock volumes in Georgia, USA, using dense Landsat time series and FIA dataset,” Remote Sensing, vol. 13, no. 2, pp. 218, 2021. https://doi.org/10.3390/rs13020218

[16] R. J. Barro and J. W. Lee, “A new data set of educational attainment in the world, 1950–2010,” Journal of development economics, vol. 104, pp. 184-198, 2013. https://doi.org/10.1016/j.jdeveco.2012.10.001

[17] J. Li, S. Guo, R. Ma, J. He, X. Zhang, D. Rui, Y. Ding, Y. Li, L. Jian, J. Cheng, and H. Guo, “Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets,” BMC Medical Research Methodology, vol. 24, no. 1, pp. 41, 2024. https://doi.org/10.1186/s12874-024-02173-x

[18] S. M. Ribeiro and C. L. de Castro, “Missing data in time series: A review of imputation methods and case study,” in Learning and Nonlinear Models-Revista Da Sociedade Brasileira De Redes Neurais-Special Issue: Time Series Analysis and Forecasting Using Computational Intelligence, vol. 19, no. 2, 2021. http://dx.doi.org/10.21528/lnlm-vol20-no1-art3

Downloads

Published

2026-05-26

Issue

Section

Articles