Performance Evaluation and Optimization Strategies for Privacy-Preserving Document Classification in Distributed Learning Environments

Authors

  • Qiaomu Zhang Computer Science, Rice University, Houston, TX, USA Author

Keywords:

privacy-preserving machine learning, federated learning, differential privacy, document classification

Abstract

The proliferation of sensitive documents across healthcare, financial, and governmental sectors necessitates robust privacy-preserving classification mechanisms. This study presents a comprehensive performance evaluation of privacy-preserving document classification within distributed learning frameworks, examining federated learning and differential privacy implementations. Through systematic experimentation on benchmark datasets, we quantify accuracy-privacy trade-offs, communication overhead, and computational costs across various privacy budget configurations. Results demonstrate that adaptive privacy allocation reduces accuracy degradation by 12-18% compared to uniform distribution while maintaining equivalent privacy guarantees. Gradient compression techniques achieve 67% communication reduction with minimal convergence impact. These findings provide actionable deployment guidelines for organizations implementing privacy-preserving document processing systems.

References

1. S. Kalra, J. Wen, J. C. Cresswell, M. Volkovs, and H. R. Tizhoosh, "Decentralized federated learning through proxy model sharing," Nature Communications, vol. 14, no. 1, p. 2899, 2023. [Online]. Available: https://doi.org/10.1038/s41467-023-38569-4

2. B. Ma, E. Lai, W. Q. Yan, and J. Wu, "A privacy-preserving word embedding text classification model based on privacy boundary constructed by deep belief network," Multimedia Tools and Applications, vol. 83, pp. 30181–30206, 2024. [Online]. Available: https://doi.org/10.1007/s11042-023-15623-3

3. T. Zhou, J. Zhang, and D. H. Tsang, "FedFA: Federated learning with feature anchors to align features and classifiers for heterogeneous data," IEEE Transactions on Mobile Computing, vol. 23, no. 6, pp. 6731–6742, 2024. [Online]. Available: https://doi.org/10.1109/TMC.2023.3321980

4. N. Fernandes, M. Dras, and A. McIver, "Generalised differential privacy for text document processing," in Data Privacy Management, Cryptocurrencies and Blockchain Technology, Springer, 2019, pp. 123–140. [Online]. Available: https://doi.org/10.1007/978-3-030-17138-4_6

5. L. Liu, X. Jiang, F. Zheng, H. Chen, G. J. Qi, H. Huang, and S. Ding, "A Bayesian federated learning framework with online Laplace approximation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 1–16, 2024. [Online]. Available: https://doi.org/10.1109/TPAMI.2023.3330515

6. D. Truhn, S. Tayebi Arasteh, O. L. Saldanha, et al., "Encrypted federated learning for secure decentralized collaboration in cancer image analysis," Medical Image Analysis, vol. 92, p. 103059, 2024. [Online]. Available: https://doi.org/10.1016/j.media.2023.103059

7. S. Dhiman, S. Nayak, G. K. Mahato, A. Ram, and S. K. Chakraborty, "Homomorphic encryption based federated learning for financial data security," in 2023 4th International Conference on Computing and Communication Systems (I3CS), 2023, pp. 1–6. [Online]. Available: https://doi.org/10.1109/I3CS58314.2023.10127423

8. Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, "Blockchain and federated learning for privacy-preserved data sharing in industrial IoT," IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 4177–4186, 2019. [Online]. Available: https://doi.org/10.1109/TII.2019.2942190

9. Z. Wang, M. Wen, Y. Xu, Y. Zhou, J. H. Wang, and L. Zhang, "Communication compression techniques in distributed deep learning: A survey," Journal of Systems Architecture, vol. 142, p. 102927, 2023. [Online]. Available: https://doi.org/10.1016/j.sysarc.2023.102927

10. J. Kim, et al., "Collective communication performance evaluation for distributed deep learning training," Applied Sciences, vol. 14, no. 12, p. 5100, 2024. [Online]. Available: https://doi.org/10.3390/app14125100

11. Y. Dong, Y. Wang, M. Gama, M. A. Mustafa, G. Deconinck, and X. Huang, "Privacy-preserving distributed learning for residential short-term load forecasting," IEEE Internet of Things Journal, 2024. [Online]. Available: https://doi.org/10.1109/JIOT.2024.3361973

12. T. Qi, F. Wu, C. Wu, L. He, Y. Huang, and X. Xie, "Differentially private knowledge transfer for federated learning," Nature Communications, vol. 14, no. 1, p. 3785, 2023. [Online]. Available: https://doi.org/10.1038/s41467-023-39632-9

13. F. Liang, Z. Zhang, H. Lu, V. C. Leung, Y. Guo, and X. Hu, "Communication-efficient large-scale distributed deep learning: A comprehensive survey," arXiv preprint arXiv:2404.06114, 2024. [Online]. Available: https://arxiv.org/abs/2404.06114

14. B. Ganguly, S. Hosseinalipour, K. T. Kim, C. G. Brinton, V. Aggarwal, D. J. Love, and M. Chiang, "Multi-edge server-assisted dynamic federated learning with an optimized floating aggregation point," IEEE/ACM Transactions on Networking, vol. 31, no. 6, pp. 2682–2697, 2023. [Online]. Available: https://doi.org/10.1109/TNET.2023.3268186

15. S. Ahn and E. Lim, "SoftMemoryBox II: A scalable, shared memory buffer framework for accelerating distributed training of large-scale deep neural networks," IEEE Access, vol. 8, pp. 207097–207111, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3038237

Downloads

Published

2026-05-06

How to Cite

Performance Evaluation and Optimization Strategies for Privacy-Preserving Document Classification in Distributed Learning Environments. (2026). Journal of Science, Innovation & Social Impact, 2(2), 94-103. https://pinnaclepubs.com/index.php/JSISI/article/view/714