Research on Data Analysis Application Based on Spark Computing

Authors

  • Jialu Yan Decoded Advertising, New York, 10005, USA Author

Keywords:

Spark computing, big data analysis, data optimization, real time processing, distributed computing

Abstract

With the development of the big data era, how to efficiently process and analyze massive data has become an urgent problem that needs to be solved in various industries. Spark, as an open-source distributed computing framework, quickly emerged as a mainstream tool in data processing due to its excellent in memory data processing capabilities and user-friendly interface. Not only does Spark have efficient computing performance, a rich ecosystem, and comprehensive support for data analysis tasks, but it is also widely used in multiple fields such as data management, machine learning, and real-time analysis. This article will delve into the application of Spark in data analysis platforms, reveal the current challenges faced, and explore corresponding optimization approaches, with the aim of significantly improving the efficiency and overall performance of data analysis.

References

1. L. Theodorakopoulos, A. Karras, and G. A. Krimpas, "Optimizing apache spark MLlib: Predictive performance of large-scale models for big data analytics," Algorithms, vol. 18, no. 2, p. 74, 2025, doi: 10.3390/a18020074.

2. S. Muvva, "Optimizing Spark data pipelines: A comprehensive study of techniques for enhancing performance and efficiency in big data processing," J. Artif. Intell. Mach. Learn. Data Sci., vol. 1, no. 4, pp. 1862–1865, 2023, doi: 10.51219/JAIMLD/sainath-muvva/412.

3. F. Song, K. Zaouk, C. Lyu, A. Sinha, Q. Fan, et al., "Spark-based cloud data analytics using multi-objective optimization," in Proc. IEEE 37th Int. Conf. Data Eng. (ICDE), 2021, pp. 00041, doi: 10.1109/ICDE51399.2021.00041.

4. M. P. Ramkumar, P. V. B. Reddy, J. T. Thirukrishna, et al., "Intrusion detection in big data using hybrid feature fusion and optimization enabled deep learning based on spark architecture," Comput. Secur., vol. 116, p. 102668, 2022, doi: 10.1016/j.cose.2022.102668.

5. S. Ibtisum, E. Bazgir, S. M. A. Rahman, et al., "A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark," World J. Adv. Res. Rev., vol. 20, no. 1, pp. 1089–1098, 2023, doi: 10.30574/wjarr.2023.20.1.2174.

6. P. Sewal and H. Singh, "A critical analysis of apache hadoop and spark for big data processing," in Proc. 6th Int. Conf. Signal Process., Comput. Control (ISPCC), 2021, doi: 10.1109/ISPCC53510.2021.9609518.

7. M. Babar, M. A. Jan, X. He, M. U. Tariq, et al., "An optimized IoT-enabled big data analytics architecture for edge–cloud computing," IEEE Internet Things J., vol. 10, no. 5, pp. 3995–4005, 2022, doi: 10.1109/JIOT.2022.3157552.

Downloads

Published

12 June 2025

Issue

Section

Article

How to Cite

Yan, J. (2025). Research on Data Analysis Application Based on Spark Computing. European Journal of AI, Computing & Informatics, 1(2), 23-29. https://pinnaclepubs.com/index.php/EJACI/article/view/134