Multimodal Deep Learning for Advertising Content Safety: A Comprehensive Study on Detection and Governance Strategies

Xin Lu

Authors

Xin Lu Computer Science, Stanford University, CA, USA Author

Keywords:

multimodal learning, advertising safety, content moderation, deep learning

Abstract

The proliferation of digital advertising across multiple platforms has created unprecedented challenges for content safety and brand protection. This paper presents a comprehensive study on multimodal deep learning approaches for detecting unsafe advertising content, addressing both explicit violations and implicit misleading information. We propose a novel framework that integrates visual, textual, and cross-modal features through advanced fusion architectures to achieve robust detection performance. Our methodology combines pre-trained language models, vision transformers, and optical character recognition systems with attention-based fusion mechanisms for comprehensive content analysis. Experimental results on a dataset of 45,000 advertising samples demonstrate that our approach achieves 92.3% accuracy in detecting policy violations, outperforming single-modality baselines by consistent gains. The framework shows particular strength in identifying implicit misleading content with an 89.1% F1-score and maintains balanced precision-recall trade-offs suitable for production deployment. This research contributes practical governance strategies for human-AI collaboration in content moderation workflows, addressing the critical need for scalable and accurate advertising safety systems in the digital ecosystem. Our method outperforms the best single-modality baseline by 15.5 percentage points and a strong late-fusion baseline by 8.6 percentage points.

Multimodal Deep Learning for Advertising Content Safety: A Comprehensive Study on Detection and Governance Strategies

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Make a Submission

ISSN

Abstract & Indexing

Partners