Causal Effect Analysis of Extracurricular Tutoring Based on Random Forest Propensity Score Matching: Evidence from Student Academic Performance
DOI:
https://doi.org/10.71222/nyjap998Keywords:
extracurricular tutoring, random forest, propensity score matching, academic performance, heterogeneous treatment effectsAbstract
The prevalence of extracurricular tutoring has sparked ongoing debates regarding its causal impact on student academic performance. Traditional assessment methods often fail to address selection bias and complex nonlinear relationships inherent in educational data. This study proposes a machine learning enhanced approach, Random Forest Propensity Score Matching (RF-PSM), to overcome the limitations of conventional propensity score methods when analyzing high dimensional observational data. By leveraging random forests for propensity score estimation, the method captures intricate interactions among student characteristics while maintaining robust covariate balance. The analysis utilizes a nationally representative student performance dataset, incorporating demographic, socioeconomic, and prior academic achievement variables. Key findings reveal significant heterogeneous treatment effects: tutoring demonstrates the strongest positive impact on median performing students, whereas effects diminish for both high and low achievers. The methodological contribution lies in demonstrating RF-PSM's superior performance over logistic regression based matching through reduced bias in effect estimation. Practically, these results inform targeted educational policies by identifying student subgroups that benefit most from supplemental instruction. The study underscores the potential of combining machine learning with causal inference frameworks to derive more nuanced insights from educational big data.
References
1. Q. Zhang, J. Li, Y. Wang, H. Liu, S. Chen, J. Zhao, et al., "Effect of extracurricular tutoring on adolescent students' cognitive ability: A propensity score matching analysis," Med., vol. 102, no. 36, p. e35090, 2023, doi: 10.1097/MD.0000000000035090.
2. B. Domingue and D. C. Briggs, "Using linear regression and propensity score matching to estimate the effect of coaching on the SAT," Mult. Linear Regres. Viewp., vol. 35, no. 1, pp. 12–29, 2009.
3. B. Mahesh, "Machine learning algorithms—a review," Int. J. Sci. Res., vol. 9, no. 1, pp. 381–386, 2020.
4. G. Shobana and K. Umamaheswari, "Forecasting by machine learning techniques and econometrics: A review," in Proc. 6th Int. Conf. Invent. Comput. Technol. (ICICT), 2021, pp. 1–6, doi: 10.1109/ICICT50816.2021.9358514.
5. M. Aria, C. Cuccurullo, A. Gnasso, et al., "A comparison among interpretative proposals for Random Forests," Mach. Learn. Appl., vol. 6, p. 100094, 2021, doi: 10.1016/j.mlwa.2021.100094.
6. G. James, D. Witten, T. Hastie, R. Tibshirani, A. Narayan, J. Heller, et al., An Introduction to Statistical Learning: with Applica-tions in Python, 1st ed., 2023.
7. S. Szekér and Á. Vathy-Fogarassy, "The effect of latent binary variables on the uncertainty of the prediction of a dichotomous outcome using logistic regression-based propensity score matching," in Health Informatics Meets eHealth, IOS Press, 2018, pp. 1–8, doi: 10.3233/978-1-61499-858-7-1.
8. M. C. Knaus, "Double machine learning-based programme evaluation under unconfoundedness," Econometrics J., vol. 25, no. 3, pp. 602–627, 2022, doi: 10.1093/ectj/utac015.
9. H. A. Salman, A. Kalakech, A. Steiti, et al., "Random forest algorithm overview," Babylon. J. Mach. Learn., vol. 2024, pp. 69–79, 2024, doi: 10.58496/BJML/2024/007.
10. J. Hill, A. Linero, J. Murray, et al., "Bayesian additive regression trees: A review and look forward," Annu. Rev. Stat. Appl., vol. 7, no. 1, pp. 251–278, 2020, doi: 10.1146/annurev-statistics-031219-041110.
11. G. K. Karamchand, "Automating cybersecurity with machine learning and predictive analytics," J. Comput. Innov., vol. 3, no. 1, 2023.
12. H. N. Cham, Propensity Score Estimation with Random Forests, Ph.D. dissertation, Arizona State Univ., 2013.
13. E. Bareinboim, J. Tian, J. Pearl, et al., "Recovering from selection bias in causal and statistical inference," in Probabilistic and Causal Inference: The Works of Judea Pearl, 2022, pp. 433–450, doi: 10.1145/3501714.3501740.
14. L. Yang, Y. Zhou, X. Liu, Q. Wang, Z. Zhang, M. Huang, et al., "Utilisation of community care services and self-rated health among elderly population in China: A survey-based analysis with propensity score matching method," BMC Public Health, vol. 21, no. 1, p. 11, 2021, doi: 10.1186/s12889-021-11989-x.
15. M. Lechner, "Causal machine learning and its use for public policy," Swiss J. Econ. Stat., vol. 159, no. 1, p. 8, 2023, doi: 10.1186/s41937-023-00113-y.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Yangjun Lu (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.