Attention-Based Multimodal Emotion Recognition for Fine-Grained Visual Ad Engagement Prediction on Instagram

Authors

  • Xin Lu Stanford University, Stanford, CA, USA Author
  • Zihan Li Northeastern University, San Jose, CA, USA Author

DOI:

https://doi.org/10.71222/wm953j16

Keywords:

multimodal emotion recognition, attention mechanisms, computational advertising, social media engagement prediction

Abstract

This paper presents a novel Attention-Based Multimodal Framework (ABMF) for emotion recognition and fine-grained engagement prediction in Instagram advertisements. Traditional approaches to advertisement assessment rely primarily on unimodal analysis and fail to capture the nuanced relationship between emotional content and engagement behaviors. The proposed framework integrates visual, textual, and metadata features through cross-modal attention mechanisms that dynamically identify emotionally salient components across modalities. We construct and annotate the Instagram Advertisement Emotion Dataset (IAED) containing 10,000 sponsored posts with valence-arousal ratings and engagement metrics. Experimental results demonstrate that ABMF achieves significant improvements over state-of-the-art baselines, with 12.1% reduction in valence MAE and 7.1% improvement in engagement prediction MAP. The research reveals distinct relationships between emotional dimensions and specific engagement behaviors: high arousal content generates 78.6% higher share rates while positive valence drives 62.7% more likes compared to negative content. The findings provide quantifiable insights for optimizing emotional content in advertisements based on campaign objectives. The cross-modal attention mechanism enables precise identification of engagement-driving features, offering Instagram advertisers a computational approach to predict and enhance user engagement through targeted emotional content design.

References

1. P. Sánchez-Núñez et al., "Opinion mining, sentiment analysis and emotion understanding in advertising: a bibliometric anal-ysis," IEEE Access, vol. 8, pp. 134563–134576, 2020, doi: 10.1109/ACCESS.2020.3009482.

2. R. Gao et al., "Adchat-TVQA: Innovative application of LLMs-based text-visual question answering method in advertising legal compliance review," in Proc. 2024 5th Int. Conf. Mach. Learn. Comput. Appl. (ICMLCA), 2024, doi: 10.1109/ICMLCA63499.2024.10754395.

3. C. Lorenza and E. Astuty, "AI-driven revolution: Effectiveness of product ads on social media using Midjourney," in Proc. 2024 6th Int. Conf. Cybern. Intell. Syst. (ICORIS), 2024, doi: 10.1109/ICORIS63540.2024.10903726.

4. A. Chaubey et al., "ContextIQ: A multimodal expert-based video retrieval system for contextual advertising," in Proc. 2025 IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2025, doi: 10.1109/WACV61041.2025.00589.

5. A. Shukla et al., "Recognition of advertisement emotions with application to computational advertising," IEEE Trans. Affect. Comput., vol. 13, no. 2, pp. 781–792, 2020, doi: 10.1109/TAFFC.2020.2964549.

6. Q. Zhao, Y. Chen, and J. Liang, "Attitudes and usage patterns of educators towards large language models: Implications for professional development and classroom innovation," Acad. J. Sociol. Manag., vol. 3, no. 2, 2024.

7. J. Zhang et al., "Privacy-preserving feature extraction for medical images based on fully homomorphic encryption," J. Adv. Comput. Syst., vol. 4, no. 2, pp. 15–28, 2024.

8. H. Zhang, E. Feng, and H. Lian, "A privacy-preserving federated learning framework for healthcare big data analytics in mul-ti-cloud environments," Spectrum Res., vol. 4, no. 1, 2024.

9. X. Xiao et al., "Anomalous payment behavior detection and risk prediction for SMEs based on LSTM-attention mechanism," Acad. J. Sociol. Manag., vol. 3, no. 2, pp. 43–51, 2025, doi: 10.70393/616a736d.323733.

10. X. Xiao et al., "A differential privacy-based mechanism for preventing data leakage in large language model training," Acad. J. Sociol. Manag., vol. 3, no. 2, pp. 33–42, 2025, doi: 10.70393/616a736d.323732.

11. C. Chen, Z. Zhang, and H. Lian, "A low-complexity joint angle estimation algorithm for weather radar echo signals based on modified ESPRIT," J. Ind. Eng. Appl. Sci., vol. 3, no. 2, pp. 33–43, 2025, doi: 10.70393/6a69656173.323832.

12. K. Xu and B. Purkayastha, "Integrating artificial intelligence with KMV models for comprehensive credit risk assessment," Acad. J. Sociol. Manag., vol. 2, no. 6, pp. 19–24, 2024.

13. K. Xu and B. Purkayastha, "Enhancing stock price prediction through Attention-BiLSTM and investor sentiment analysis," Acad. J. Sociol. Manag., vol. 2, no. 6, pp. 14–18, 2024.

14. M. Shu, J. Liang, and C. Zhu, "Automated risk factor extraction from unstructured loan documents: An NLP approach to credit default prediction," Artif. Intell. Mach. Learn. Rev., vol. 5, no. 2, pp. 10–24, 2024.

15. M. Shu, Z. Wang, and J. Liang, "Early warning indicators for financial market anomalies: A multi-signal integration approach," J. Adv. Comput. Syst., vol. 4, no. 9, pp. 68–84, 2024, doi: 10.69987/JACS.2024.40907.

16. Y. Liu, W. Bi, and J. Fan, "Semantic network analysis of financial regulatory documents: Extracting early risk warning signals," Acad. J. Sociol. Manag., vol. 3, no. 2, pp. 22–32, 2025, doi: 10.70393/616a736d.323731.

17. Y. Zhang, J. Fan, and B. Dong, "Deep learning-based analysis of social media sentiment impact on cryptocurrency market microstructure," Acad. J. Sociol. Manag., vol. 3, no. 2, pp. 13–21, 2025, doi: 10.70393/616a736d.323730.

18. W. Ren et al., "Trojan virus detection and classification based on graph convolutional neural network algorithm," J. Ind. Eng. Appl. Sci., vol. 3, no. 2, pp. 1–5, 2025, doi: 10.70393/6a69656173.323735.

19. C. Zhang, "An overview of cough sounds analysis," in Proc. 2017 5th Int. Conf. Front. Manuf. Sci. Meas. Technol. (FMSMT 2017), Atlantis Press, 2017, doi: 10.2991/fmsmt-17.2017.138.

20. W. Wan et al., "Privacy-preserving industrial IoT data analysis using federated learning in multi-cloud environments," Appl. Comput. Eng., vol. 141, pp. 7–16, 2025, doi: 10.54254/2755-2721/2025.21395.

21. Z. Wu et al., "Privacy-preserving financial transaction pattern recognition: A differential privacy approach," 2025, doi: 10.20944/preprints202504.1583.v1.

22. G. Rao, S. Zheng, and L. Guo, "Dynamic reinforcement learning for suspicious fund flow detection: A multi-layer transaction network approach with adaptive strategy optimization," 2025, doi: 10.20944/preprints202504.1440.v1.

23. L. Yan, J. Weng, and D. Ma, "Enhanced transformer-based algorithm for key-frame action recognition in basketball shooting," 2025, doi: 10.20944/preprints202503.1364.v1.

Downloads

Published

31 August 2025

How to Cite

Lu, X., & Li, Z. (2025). Attention-Based Multimodal Emotion Recognition for Fine-Grained Visual Ad Engagement Prediction on Instagram. Pinnacle Academic Press Proceedings Series, 3, 204-218. https://doi.org/10.71222/wm953j16