Evaluating Prompt Engineering Strategies for Few-Shot Cyber Threat Intelligence Entity and Relation Extraction from Multi-Source Reports

Yanhuan Chen; Tianxing Tang

Authors

Yanhuan Chen Master of Engineering, Dartmouth College, Hanover, NH, USA Author
Tianxing Tang Translation and Localization Management, Middlebury Institute of International Studies, Monterey, CA, USA Author

Keywords:

cyber threat intelligence, named entity recognition, prompt engineering, few-shot learning

Abstract

The proliferation of multi-source cyber threat intelligence reports---spanning vulnerability databases, government advisories, vendor analyses, and open-source feeds---has outpaced the capacity of human analysts to extract structured knowledge about adversary tactics, techniques, and procedures. While large language models present a promising avenue for automating this extraction under low-resource conditions, no systematic empirical comparison of prompt engineering strategies exists for the cyber threat intelligence domain. This study evaluates six prompt engineering strategies---zero-shot, one-shot, three-shot, five-shot, retrieval-augmented five-shot, and chain-of-thought five-shot---across four publicly available cyber threat intelligence named entity recognition datasets (DNRTI, CyNER, AnnoCTR, APTNER) and one relation extraction corpus, using GPT-4, GPT-3.5-turbo, and Llama-3-70B. The retrieval-augmented five-shot strategy achieves the highest named entity recognition F1 of 0.753 on CyNER with GPT-4, narrowing the gap with the fine-tuned SecureBERT baseline to 2.8 percentage points. Chain-of-thought prompting yields the lowest expected calibration error (0.108), suggesting its value for uncertainty-aware intelligence triage. Cross-source extraction variance reaches 12.2 F1 points between the easiest and hardest corpora, underscoring the challenge of heterogeneous intelligence fusion. These findings offer actionable guidance for deploying prompt-based extraction in operational threat intelligence pipelines aligned with the NIST Cybersecurity Framework and national cyber defense priorities.

References

1. M. Büchel, et al., "SoK: Automated TTP extraction from CTI reports --- Are we there yet?," in Proceedings of the 34th USENIX Security Symposium, USENIX Association, 2025.

2. T. Satyapanich, F. Ferraro, and T. Finin, "CASIE: Extracting cybersecurity event information from text," in Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8749--8757, AAAI Press, 2020.

3. Y. Cheng, O. Bajaber, S. A. Tsegai, D. Song, and P. Gao, "CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models," in Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2025.

4. Z. Jie and W. Lu, "LinkNER: Linking local named entity recognition models to large language models using uncertainty," in Proceedings of the ACM Web Conference 2024 (WWW '24), ACM, 2024. Available: https://doi.org/10.1145/3589334.3645414

5. Z. Li, J. Zeng, Y. Chen, and Z. Liang, "AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports," in European Symposium on Research in Computer Security (ESORICS 2022), pp. 589--609, Springer, 2022.

6. M. T. Alam, D. Bhusal, Y. Park, and N. Rastogi, "LADDER: Looking beyond IoCs --- Automatically extracting attack patterns from external CTI," in *Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023)*, ACM, 2023. Available: https://doi.org/10.1145/3607199.3607208

7. E. Aghaei, X. Niu, W. Shadid, and E. Al-Shaer, "SecureBERT: A domain-specific language model for cybersecurity," arXiv preprint arXiv:2204.02685, 2022.

8. P. T. Chung, "Enhancing Dental Polymer Formulation through Interpretable Machine Learning: A Comparative Analysis of Feature Selection and Algorithm Performance," in *Proceedings of the 2025 6th International Conference on Computer Science and Management Technology*, pp. 234-241, Dec. 2025.

9. W. Zhou, S. Zhang, Y. Gu, M. Chen, and H. Poon, "UniversalNER: Targeted distillation from large language models for open named entity recognition," in *Proceedings of the 12th International Conference on Learning Representations (ICLR 2024)*, 2024.

10. T. Xie, Q. Li, J. Zhang, Y. Zhang, Z. Liu, and H. Wang, "Empirical study of zero-shot NER with ChatGPT," in *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)*, pp. 7935--7956, ACL, 2023.

11. M. Han, "Privacy-Preserving Collaborative Learning Across Healthcare Institutions: An Adaptive Approach with Gradient Compression and Dynamic Privacy Budget Allocation," in *Proceedings of the 2025 6th International Conference on Computer Science and Management Technology*, pp. 679-684, Dec. 2025.

12. D. Liang and C. Cai, "Optimizing Large-Scale Contract Review through Data Analytics: Practical Evidence from IPO Audits," in *Proceedings of the 2025 6th International Conference on Computer Science and Management Technology*, pp. 242-249, Dec. 2025.

13. Z. Wan, F. Cheng, Z. Mao, Q. Liu, H. Song, J. Li, and S. Kurohashi, "GPT-RE: In-context learning for relation extraction using large language models," in *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)*, pp. 3534--3547, ACL, 2023.

14. S. Wadhwa, S. Amir, and B. Wallace, "Revisiting relation extraction in the era of large language models," in *Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)*, pp. 15566--15589, ACL, 2023.

15. P. T. Chung, "Multi-Objective Optimization of Process Parameters for Dental Resin 3D Printing Using Improved NSGA-II Algorithm," Journal of Science, Innovation & Social Impact, vol. 2, no. 1, pp. 276-287, 2026.

16. Y. Liu, "AI-Enhanced Healthcare Data Quality Governance: An Integrated Approach for Anomaly Detection and Integrity Verification," Journal of Sustainability, Policy, and Practice, vol. 2, no. 1, pp. 215-229, 2026.

17. X. Long, "Performance Evaluation of Anomaly-Based Detection Approaches for Zero-Day Attack Early Warning in Cloud Infrastructure," Journal of Science, Innovation & Social Impact, vol. 2, no. 1, pp. 352-363, 2026.

18. R. Zhang, Y. Su, B. D. Trisedya, X. Zhao, M. Yang, H. Cheng, and J. Qi, "AutoAlign: Fully automatic and effective knowledge graph alignment enabled by large language models," IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 7, pp. 3168--3182, 2024. Available: https://doi.org/10.1109/TKDE.2023.3325484

19. X. Zhao, Y. Jia, A. Li, R. Jiang, and Y. Song, "Multi-source knowledge fusion: A survey," World Wide Web, vol. 24, pp. 1947--1987, 2021. Available: https://doi.org/10.1007/s11280-020-00811-0

20. Y. Zhang, T. Du, Y. Ma, J. Yan, S. Li, Z. Li, et al., "AttacKG+: Boosting attack knowledge graph construction with large language models," Computers & Security, vol. 150, p. 104220, 2025. Available: https://doi.org/10.1016/j.cose.2024.104220

21. Y. Ma, Y. Cao, Y. Hong, and A. Sun, "Large language model is not a good few-shot information extractor, but a good reranker for hard samples!," in Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 10572--10601, ACL, 2023.

22. A. Jagannatha and H. Yu, "Calibrating structured output predictors for natural language processing," in *Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)*, pp. 2078--2092, ACL, 2020.

23. J. Geng, F. Cai, Y. Wang, H. Koeppl, P. Nakov, and I. Gurevych, "A survey of confidence estimation and calibration in large language models," in *Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)*, pp. 6577--6595, ACL, 2024.

24. L. Huang and X. Xiao, "CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence," in Proceedings of the First Conference on Language Modeling (COLM 2024), 2024.

25. Z. Yan, S. Yang, W. Liu, and K. Tu, "Joint entity and relation extraction with span pruning and hypergraph neural networks," in *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)*, pp. 7512--7526, ACL, 2023.

26. L. Guo, Z. Chen, J. Chen, Y. Fang, W. Zhang, and H. Chen, "Revisit and outstrip entity alignment: A perspective of generative models," in *Proceedings of the 12th International Conference on Learning Representations (ICLR 2024)*, 2024.

27. M. Zhong, "Multi-Dimensional Feature Analysis and Evaluation Methods for Anomalous Fund Flow Identification in Cross-Border Financial Transactions," Journal of Science, Innovation & Social Impact, vol. 2, no. 2, pp. 1-13, 2026.

28. Y. Zhang, "A Comparative Study of Machine Learning Methods for Automated Customer Service Dialogue Quality Assessment," Journal of Science, Innovation & Social Impact, vol. 2, no. 1, pp. 328-338, 2026.

29. L. Zhong, J. Wu, Q. Li, H. Peng, and X. Wu, "A comprehensive survey on automatic knowledge graph construction," ACM Computing Surveys, vol. 56, no. 4, pp. 1--62, 2024. Available: https://doi.org/10.1145/3618295

30. Y. Zhu, X. Wang, J. Chen, S. Qiao, Y. Ou, Y. Yao, S. Deng, H. Chen, and N. Zhang, "LLMs for knowledge graph construction and reasoning: Recent capabilities and future opportunities," arXiv preprint arXiv:2305.13168, 2023.

Evaluating Prompt Engineering Strategies for Few-Shot Cyber Threat Intelligence Entity and Relation Extraction from Multi-Source Reports

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

ISSN

Abstract & Indexing

Partners