Improving Database Anomaly Detection Efficiency through Sample Difficulty Estimation

Maoxi Li; Daobo Ma; Yingqi Zhang

Authors

Maoxi Li Business Analytics, Fordham University, NY, USA Author
Daobo Ma Business Administration, California Institute of Advanced Management, CA, USA Author
Yingqi Zhang Computer Science, Carnegie Mellon University, CA, USA Author

Keywords:

anomaly detection, sample difficulty estimation, database systems, computational efficiency

Abstract

This paper presents a novel approach to improving database anomaly detection efficiency through sample difficulty estimation. Traditional anomaly detection methods often apply uniform computational resources across all data samples regardless of their complexity, resulting in inefficient resource utilization. Our framework addresses this limitation by quantifying the "difficulty" of individual database instances and strategically allocating computational resources where they provide maximum benefit. The proposed model combines isolation scores, density-based metrics, and surprise adequacy measurements to comprehensively assess sample difficulty. Based on these assessments, a difficulty-oriented priority assignment mechanism implemented through a sigmoid mapping function directs intensive computational efforts to challenging cases while processing simpler samples with lighter methods. Experimental evaluation across five diverse datasets demonstrates that our approach achieves a 52.84% reduction in average processing time compared to uniform approaches, while maintaining or improving detection accuracy. The framework achieves the highest Average Percentage of Faults Detected (APFD) score of 0.915, outperforming both traditional and deep learning-based methods. This research provides a foundation for developing intelligent, resource-aware anomaly detection systems capable of handling the increasing scale and complexity of modern database environments.

References

1. V. Mosin, M. Staron, D. Durisic, F. G. de Oliveira Neto, S. K. Pandey, and A. C. Koppisetty, "Comparing input prioritization techniques for testing deep learning algorithms," in Proc. 48th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA), Aug. 2022, pp. 76–83, IEEE, doi: 10.1109/SEAA56994.2022.00020.

2. X. Zhao and C. Huang, "Efficient anomaly detection algorithm for operational data based on fuzzy cognitive map," in Proc. 3rd Int. Conf. Artif. Intell., Internet Things Cloud Comput. Technol. (AIoTC), Sep. 2024, pp. 201–204, IEEE, doi: 10.1109/AIoTC63215.2024.10748277.

3. Y. Liu, Y. Lou, and S. Huang, "Parallel algorithm of flow data anomaly detection based on isolated forest," in Proc. Int. Conf. Artif. Intell. Electromech. Autom. (AIEA), Jun. 2020, pp. 132–135, IEEE, doi: 10.1109/AIEA51086.2020.00035.

4. J. Pan, Y. Dong, B. Chen, J. Fu, and A. Huang, "Research on parallel detection of heterogeneous cloud resources with multiple anomalies in cross-type database," in Proc. 11th Int. Conf. Inf. Technol.: IoT Smart City (ITIoTSC), Aug. 2023, pp. 68–72, IEEE, doi: 10.1109/ITIoTSC60379.2023.00019.

5. D. D. Shirbhate and S. R. Gupta, "Unveiling covert databases: A comprehensive detection framework," in Proc. 2nd DMIHER Int. Conf. Artif. Intell. Healthcare, Educ. Ind. (IDICAIEI), Nov. 2024, pp. 1–6, IEEE, doi: 10.1109/IDICAIEI61867.2024.10842899.

6. W. Lu, C. Ni, H. Wang, J. Wu, and C. Zhang, "Machine learning-based automatic fault diagnosis method for operating sys-tems," World J. Innov. Mod. Technol., vol. 7, no. 1, 2024, doi: 10.53469/wjimt.2024.07(02).12.

7. C. Jiang, H. Zhang, and Y. Xi, "Automated game localization quality assessment using deep learning: A case study in error pattern recognition," J. Adv. Comput. Syst., vol. 4, no. 10, pp. 25–37, 2024, doi: 10.69987/JACS.2024.41003.

8. Y. Liu, Y. Xu, and S. Zhou, "Enhancing user experience through machine learning-based personalized recommendation systems: Behavior data-driven UI design," Authorea Preprints, 2024, doi: 10.54254/2755-2721/2024.17905.

9. D. D. Shirbhate and S. R. Gupta, "Unveiling covert databases: A comprehensive detection framework," in Proc. 2nd DMIHER Int. Conf. Artif. Intell. Healthcare, Educ. Ind. (IDICAIEI), Nov. 2024, pp. 1–6, IEEE, doi: 10.1109/IDICAIEI61867.2024.10842899.

10. D. Huang, M. Yang, and W. Zheng, "Using deep reinforcement learning for optimizing process parameters in CHO cell cultures for monoclonal antibody production," Artif. Intell. Mach. Learn. Rev., vol. 5, no. 3, pp. 12–27, 2024, doi: 10.69987/AIMLR.2024.50302.

11. T. Huang, Z. Xu, P. Yu, J. Yi, and X. Xu, "A hybrid transformer model for fake news detection: Leveraging Bayesian optimiza-tion and bidirectional recurrent unit," 2025, arXiv preprint arXiv:2502.09097, doi: 10.48550/arXiv.2502.09097.

12. J. Weng, X. Jiang, and Y. Chen, "Real-time squat pose assessment and injury risk prediction based on enhanced temporal convolutional neural networks," Int. J. Med. Biol. Health Res., vol. 5, no. 1, pp. 53–62, 2024, doi: 10.54660/IJMBHR.2024.5.1.53-62.

13. P. Yu, Z. Xu, J. Wang, and X. Xu, "The application of large language models in recommendation systems," 2025, arXiv preprint arXiv:2501.02178, doi: 10.48550/arXiv.2501.02178.

14. J. Chen, L. Yan, S. Wang, and W. Zheng, "Deep reinforcement learning-based automatic test case generation for hardware verification," J. Artif. Intell. Gen. Sci. (JAIGS), vol. 6, no. 1, pp. 409–429, 2024, doi: 10.60087/jaigs.v6i1.267.

15. D. Ma, "AI-driven optimization of intergenerational community services: An empirical analysis of elderly care communities in Los Angeles," Artif. Intell. Mach. Learn. Rev., vol. 5, no. 4, pp. 10–25, 2024, doi: 10.69987/AIMLR.2024.50402.

16. P. Wang, M. Varvello, C. Ni, R. Yu, and A. Kuzmanovic, "Web-lego: trading content strictness for faster webpages," in Proc. IEEE INFOCOM, May 2021, pp. 1–10, IEEE, doi: 10.1109/INFOCOM42981.2021.9488904.