A Comparative Study of NER Methods for Ownership Structure Extraction from M&A Due Diligence Documents
Authors
Hanfei Zhang
Law, Emory University School of Law, Atlanta, GA, USA
Author
Keywords:
named entity recognition, due diligence automation, ownership structure extraction, legal document processing
Abstract
Cross-border mergers and acquisitions require efficient extraction of ownership structures from due diligence documentation. This study compares named-entity recognition methodologies for extracting equity structures from corporate governance documents. We construct an annotated dataset from authentic materials and evaluate six NER approaches spanning traditional sequence labeling (CRF, BiLSTM-CRF), general-purpose transformers (BERT, RoBERTa), and domain-adapted models (FinBERT-MRC, Legal-BERT). Legal-BERT achieves an overall F1 score of 87.3% while encountering challenges in multilingual entity names and nested ownership structures. Error analysis reveals three primary failure modes-cross-lingual recognition ambiguities, percentage-quantity confusion, and challenges in representing complex structures-providing actionable guidance for implementing automated equity analysis systems in time-sensitive M&A transactions.