Loan Default Prediction and Feature Importance Analysis Based on the XGBoost Model
DOI:
https://doi.org/10.71222/p9qmaa87Keywords:
loan default prediction, XGBoost, machine learning, feature importance, credit scoring, financial risk modelingAbstract
Loan default prediction is a critical task in financial risk management. Traditional statistical models often struggle to handle large-scale, nonlinear, and high-dimensional financial data. In this study, we explore the application of the eXtreme Gradient Boosting (XGBoost) model for predicting loan defaults using a publicly available dataset from Kaggle. The paper simulates a complete analytical pipeline, including data preprocessing, model training, evaluation, and feature importance analysis. Simulated results demonstrate that XGBoost can achieve high predictive accuracy and robust ability to distinguish between defaulters and non-defaulters. Furthermore, feature importance analysis reveals that variables such as revolving credit utilization, borrower age, and past due history play crucial roles in determining default risk. This research highlights the effectiveness and interpretability of using XGBoost in financial decision-making scenarios.
References
1. S. Lessmann, B. Baesens, H. V. Seow, and L. C. Thomas, “Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research,” Eur. J. Oper. Res., vol. 247, no. 1, pp. 124–136, Apr. 2015, doi: 10.1016/j.ejor.2015.05.030.
2. Z. Li et al., "Application of XGBoost in P2P default prediction," in J. Phys.: Conf. Ser., vol. 1871, no. 1, p. 012115, 2021, doi: 10.1088/1742-6596/1871/1/012115.
3. Yeh, I. C., & Lien, C. H., "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients," Expert Syst. Appl., vol. 36, no. 2, pp. 2473-2480, 2009, doi: 10.1016/j.eswa.2007.12.020.
4. W. Guo and Z. Z. Zhou, "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," J. Forecast., vol. 41, no. 6, pp. 1248–1313, 2022, doi: 10.1002/for.2856.
5. S. B. Jabeur, N. Stef, and P. Carmona, "Bankruptcy prediction using the XGBoost algorithm and variable importance feature engineering," Comput. Econ., vol. 61, no. 2, pp. 715–741, 2023, doi: 10.1007/s10614-021-10227-1.
6. X. Zhu et al., "Explainable prediction of loan default based on machine learning models," Data Sci. Manag., vol. 6, no. 3, pp. 123–133, 2023, doi: 10.1016/j.dsm.2023.04.003.
7. X. Ma et al., "Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algo-rithms according to different high dimensional data cleaning," Electron. Commer. Res. Appl., vol. 31, pp. 24–39, 2018, doi: 10.1016/j.elerap.2018.08.002.
8. J. Zhou et al., "Default prediction in P2P lending from high-dimensional data based on machine learning," Physica A, vol. 534, p. 122370, 2019, doi: 10.1016/j.physa.2019.122370.
9. J. Gao, W. Sun, and X. Sui, "Research on Default Prediction for Credit Card Users Based on XGBoost‐LSTM Model," Discrete Dyn. Nat. Soc., vol. 2021, no. 1, p. 5080472, 2021, doi: 10.1155/2021/5080472.
10. M. Antar and T. Tayachi, "Partial dependence analysis of financial ratios in predicting company defaults: random forest vs XGBoost models," Digit. Finance, 2025, doi: 10.1007/s42521-025-00135-6.
11. J. Wang, W. Rong, Z. Zhang, and D. Mei, "Credit debt default risk assessment based on the XGBoost algorithm: An empirical study from China," Wirel. Commun. Mob. Comput., vol. 2022, no. 1, p. 8005493, 2022, doi: 10.1155/2022/8005493.
12. Y. Ouyang, "Loan Default Prediction Based on Logistic Regression and XGBoost Modeling," in Proc. 2024 IEEE 2nd Int. Conf. Control, Electron. Comput. Technol. (ICCECT), 2024, doi: 10.1109/ICCECT60629.2024.10546207.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ruoyu Qi (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.