Improving Automatic Essay Assessment Through Cosine Similarity Leveraging a Semantic Corpus

Authors

  • Fitria Ekarini Universitas Negeri Semarang, Semarang, Indonesia Author
  • Septian Eko Prasetyo Universitas Negeri Semarang, Semarang, Indonesia Author
  • Anan Nugroho Universitas Negeri Semarang, Semarang, Indonesia Author
  • Alfian Ardhiansyah Universitas Negeri Semarang, Semarang, Indonesia Author
  • Clarita Aprilliani Universitas Negeri Semarang, Semarang, Indonesia Author
  • Fakhri Ahmda Kurnia Universitas Negeri Semarang, Semarang, Indonesia Author

DOI:

https://doi.org/10.71222/08rgdd87

Keywords:

Automatic Essay Scoring, Cosine Similarity, semantic data, text processing

Abstract

Automatic Essay Scoring (AES) represents an effective solution for facilitating automated evaluation of written essays by mitigating evaluator subjectivity and accelerating the assessment process. Nonetheless, a persistent challenge lies in achieving high accuracy due to limitations in semantic understanding. This study employs Cosine Similarity as a baseline approach and further integrates a semantic data corpus to enhance the representation of textual meaning. Empirical results demonstrate that relying solely on Cosine Similarity captures predominantly lexical-level similarities, yielding limited correlation with human scoring. The incorporation of a semantic corpus substantially improves the system’s capacity to recognize synonyms and linguistic variations, thereby enhancing the sensitivity and reliability of the scoring process. Despite these improvements, the findings underscore the necessity for further corpus refinement, evaluation on larger and more diverse datasets, and assessment using multiple correlation metrics. Overall, this study provides a substantive contribution to the development of AES systems that are more accurate, consistent, and closely aligned with human judgment, thereby advancing the field of automated educational assessment.

References

1. L. Chen, P. Chen, and Z. Lin, “Artificial intelligence in education: A review,” IEEE Access, vol. 8, 2020.

2. A. Alam, “Possibilities and apprehensions in the landscape of artificial intelligence in education,” in Proc. Int. Conf. Computational Intelligence and Computing Applications (ICCICA), 2021.

3. K. Ernawati, B. S. Nugroho, C. Suryana, A. Riyanto, and E. Fatmawati, “The advantages of digital applications in public health services on automation era,” Int. J. Health Sci. (Qassim), vol. 6, no. 1, 2022.

4. D. Ramesh and S. K. Sanampudi, “An automated essay scoring systems: A systematic literature review,” Artif. Intell. Rev., vol. 55, 2021.

5. E. E. Hall, “A user-centered design approach to evaluating the usability of automated essay scoring systems,” M.S. thesis, Virginia Tech, Blacksburg, VA, USA, 2023.

6. Z. Berezvai, G. D. Lukáts, and R. Molontay, “Can professors buy better evaluation with lenient grading? The effect of grade inflation on student evaluation of teaching,” Assess. Eval. High. Educ., vol. 46, no. 5, 2021.

7. W. Stroebe, “Student evaluations of teaching encourages poor teaching and contributes to grade inflation: A theoretical and empirical analysis,” Basic Appl. Soc. Psych., vol. 42, no. 4, 2020.

8. C. T. Lim, C. H. Bong, W. S. Wong, and N. K. Lee, “A comprehensive review of automated essay scoring (AES) research and development,” Pertanika J. Sci. Technol., vol. 29, no. 2, 2021.

9. B. D. Wijanarko, Bachtiar, R. B. Hassan, D. F. Murad, R. B. Ihsan, and Y. Heryadi, “AI-based feature extraction and cosine similarity for automation of student learning assessment,” in Proc. Int. Arab Conf. Information Technology (ACIT), 2023.

10. J. Y. H. Bai et al., “Automated essay scoring (AES) systems: Opportunities and challenges for open and distance education,” in Pan-Commonwealth Forum 10 (PCF10), 2022.

11. V. Wagh, S. Laddha, and P. Kadam, “Detecting plagiarism using latent semantic analysis and cosine similarity approach,” in Proc. IEEE Int. Conf. Blockchain and Distributed Systems Security (ICBDS), 2024.

12. S. Ahmad and M. Laroche, “Extracting marketing information from product reviews: A comparative study of latent semantic analysis and probabilistic latent semantic analysis,” J. Marketing Analytics, vol. 11, no. 4, 2023.

13. E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “The accuracy comparison among Word2Vec, GloVe, and FastText towards convolution neural network (CNN) text classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 2, 2022.

14. F. Rahutomo, T. A. Roshinta, R. Erfan, and I. Siradjuddin, “Open problems in Indonesian automatic essay scoring system,” Int. J. Eng. Technol., vol. 7, no. 4, 2018.

15. F. Pribadi, T. B. Adji, A. E. Permanasari, and A. Mulwinda, “Automatic short answer scoring using words overlapping methods,” in Proc. 5th Int. Conf. Education, Concept, and Application of Green Technology, 2017.

16. M. A. Fauzi, D. C. Utomo, B. D. Setiawan, and E. S. Pramukantoro, “Automatic essay scoring system using n-gram and cosine similarity for gamification-based e-learning,” in Proc. Int. Conf. Advances in Image Processing (ICAIP), 2017.

17. E. S. Pramukantoro and M. A. Fauzi, “Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification,” in Proc. Int. Conf. Advanced Computer Science and Information Systems (ICACSIS), 2016.

18. A. A. Ewees, M. Eisa, and M. M. Refaat, “Comparison of cosine similarity and k-NN for automated essays scoring,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 3, no. 12, 2014.

19. P. Sitikhu, K. Pahi, P. Thapa, and S. Shakya, “A comparison of semantic similarity methods for maximum human interpretability,” in Proc. Int. Conf. Artificial Intelligence for Transforming Business and Society, 2019.

Downloads

Published

27 December 2025

How to Cite

Ekarini, F., Prasetyo, S. E., Nugroho, A., Ardhiansyah, A., Aprilliani, C., & Kurnia, F. A. (2025). Improving Automatic Essay Assessment Through Cosine Similarity Leveraging a Semantic Corpus. Pinnacle Academic Press Proceedings Series, 6, 77-83. https://doi.org/10.71222/08rgdd87