Loan Default Prediction and Feature Importance Analysis Based on the XGBoost Model

Authors

  • Ruoyu Qi North Carolina State University, Raleigh, North Carolina, USA Author

DOI:

https://doi.org/10.71222/p9qmaa87

Keywords:

loan default prediction, XGBoost, machine learning, feature importance, credit scoring, financial risk modeling

Abstract

Loan default prediction is a critical task in financial risk management. Traditional statistical models often struggle to handle large-scale, nonlinear, and high-dimensional financial data. In this study, we explore the application of the eXtreme Gradient Boosting (XGBoost) model for predicting loan defaults using a publicly available dataset from Kaggle. The paper simulates a complete analytical pipeline, including data preprocessing, model training, evaluation, and feature importance analysis. Simulated results demonstrate that XGBoost can achieve high predictive accuracy and robust ability to distinguish between defaulters and non-defaulters. Furthermore, feature importance analysis reveals that variables such as revolving credit utilization, borrower age, and past due history play crucial roles in determining default risk. This research highlights the effectiveness and interpretability of using XGBoost in financial decision-making scenarios.

References

1. S. Lessmann, B. Baesens, H. V. Seow, and L. C. Thomas, “Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research,” Eur. J. Oper. Res., vol. 247, no. 1, pp. 124–136, Apr. 2015, doi: 10.1016/j.ejor.2015.05.030.

2. Z. Li et al., "Application of XGBoost in P2P default prediction," in J. Phys.: Conf. Ser., vol. 1871, no. 1, p. 012115, 2021, doi: 10.1088/1742-6596/1871/1/012115.

3. Yeh, I. C., & Lien, C. H., "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients," Expert Syst. Appl., vol. 36, no. 2, pp. 2473-2480, 2009, doi: 10.1016/j.eswa.2007.12.020.

4. W. Guo and Z. Z. Zhou, "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," J. Forecast., vol. 41, no. 6, pp. 1248–1313, 2022, doi: 10.1002/for.2856.

5. S. B. Jabeur, N. Stef, and P. Carmona, "Bankruptcy prediction using the XGBoost algorithm and variable importance feature engineering," Comput. Econ., vol. 61, no. 2, pp. 715–741, 2023, doi: 10.1007/s10614-021-10227-1.

6. X. Zhu et al., "Explainable prediction of loan default based on machine learning models," Data Sci. Manag., vol. 6, no. 3, pp. 123–133, 2023, doi: 10.1016/j.dsm.2023.04.003.

7. X. Ma et al., "Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algo-rithms according to different high dimensional data cleaning," Electron. Commer. Res. Appl., vol. 31, pp. 24–39, 2018, doi: 10.1016/j.elerap.2018.08.002.

8. J. Zhou et al., "Default prediction in P2P lending from high-dimensional data based on machine learning," Physica A, vol. 534, p. 122370, 2019, doi: 10.1016/j.physa.2019.122370.

9. J. Gao, W. Sun, and X. Sui, "Research on Default Prediction for Credit Card Users Based on XGBoost‐LSTM Model," Discrete Dyn. Nat. Soc., vol. 2021, no. 1, p. 5080472, 2021, doi: 10.1155/2021/5080472.

10. M. Antar and T. Tayachi, "Partial dependence analysis of financial ratios in predicting company defaults: random forest vs XGBoost models," Digit. Finance, 2025, doi: 10.1007/s42521-025-00135-6.

11. J. Wang, W. Rong, Z. Zhang, and D. Mei, "Credit debt default risk assessment based on the XGBoost algorithm: An empirical study from China," Wirel. Commun. Mob. Comput., vol. 2022, no. 1, p. 8005493, 2022, doi: 10.1155/2022/8005493.

12. Y. Ouyang, "Loan Default Prediction Based on Logistic Regression and XGBoost Modeling," in Proc. 2024 IEEE 2nd Int. Conf. Control, Electron. Comput. Technol. (ICCECT), 2024, doi: 10.1109/ICCECT60629.2024.10546207.

Downloads

Published

07 July 2025

Issue

Section

Article

How to Cite

Qi, R. (2025). Loan Default Prediction and Feature Importance Analysis Based on the XGBoost Model. European Journal of Business, Economics & Management, 1(2), 141-149. https://doi.org/10.71222/p9qmaa87