Research on Data Analysis Application Based on Spark Computing
Keywords:
Spark computing, big data analysis, data optimization, real time processing, distributed computingAbstract
With the development of the big data era, how to efficiently process and analyze massive data has become an urgent problem that needs to be solved in various industries. Spark, as an open-source distributed computing framework, quickly emerged as a mainstream tool in data processing due to its excellent in memory data processing capabilities and user-friendly interface. Not only does Spark have efficient computing performance, a rich ecosystem, and comprehensive support for data analysis tasks, but it is also widely used in multiple fields such as data management, machine learning, and real-time analysis. This article will delve into the application of Spark in data analysis platforms, reveal the current challenges faced, and explore corresponding optimization approaches, with the aim of significantly improving the efficiency and overall performance of data analysis.
References
1. L. Theodorakopoulos, A. Karras, and G. A. Krimpas, "Optimizing apache spark MLlib: Predictive performance of large-scale models for big data analytics," Algorithms, vol. 18, no. 2, p. 74, 2025, doi: 10.3390/a18020074.
2. S. Muvva, "Optimizing Spark data pipelines: A comprehensive study of techniques for enhancing performance and efficiency in big data processing," J. Artif. Intell. Mach. Learn. Data Sci., vol. 1, no. 4, pp. 1862–1865, 2023, doi: 10.51219/JAIMLD/sainath-muvva/412.
3. F. Song, K. Zaouk, C. Lyu, A. Sinha, Q. Fan, et al., "Spark-based cloud data analytics using multi-objective optimization," in Proc. IEEE 37th Int. Conf. Data Eng. (ICDE), 2021, pp. 00041, doi: 10.1109/ICDE51399.2021.00041.
4. M. P. Ramkumar, P. V. B. Reddy, J. T. Thirukrishna, et al., "Intrusion detection in big data using hybrid feature fusion and optimization enabled deep learning based on spark architecture," Comput. Secur., vol. 116, p. 102668, 2022, doi: 10.1016/j.cose.2022.102668.
5. S. Ibtisum, E. Bazgir, S. M. A. Rahman, et al., "A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark," World J. Adv. Res. Rev., vol. 20, no. 1, pp. 1089–1098, 2023, doi: 10.30574/wjarr.2023.20.1.2174.
6. P. Sewal and H. Singh, "A critical analysis of apache hadoop and spark for big data processing," in Proc. 6th Int. Conf. Signal Process., Comput. Control (ISPCC), 2021, doi: 10.1109/ISPCC53510.2021.9609518.
7. M. Babar, M. A. Jan, X. He, M. U. Tariq, et al., "An optimized IoT-enabled big data analytics architecture for edge–cloud computing," IEEE Internet Things J., vol. 10, no. 5, pp. 3995–4005, 2022, doi: 10.1109/JIOT.2022.3157552.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Jialu Yan (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.