Journal: Nature communications
This study retrospectively evaluated a commercial AI model for cancer detection on digital breast tomosynthesis screening in 167,860 exams (1,368 screen‑detected cancers; 166,387 negatives). The model showed strong overall performance in distinguishing cancers from negative exams (AUROC 0.91; sensitivity 0.73), with generally consistent accuracy across demographic subgroups.
Performance varied by imaging and pathologic features:
- Lower performance:
- In situ cancers: AUROC 0.85; sensitivity 0.55
- Lesions presenting as calcifications: AUROC 0.80; sensitivity 0.66
- Dense breast tissue: AUROC 0.88; sensitivity 0.63
- Higher performance:
- Masses: AUROC 0.93; sensitivity 0.85
- Architectural distortions: AUROC 0.90; sensitivity 0.83
The authors emphasize that, despite strong overall metrics, granular subgroup analysis reveals important weaknesses—particularly for in situ disease, calcifications, and dense breasts—underscoring the need for detailed evaluation and ongoing vigilance before and during clinical deployment of AI tools in breast screening.