Abstract
Breast cancer remains one of the most prevalent causes of cancer-related mortality among women worldwide, underscoring the critical need for early detection and accurate diagnosis. This study presents an advanced, Transformer-based deep learning framework that significantly enhances mammogram-based breast cancer detection. We fine-tuned pretrained Vision Transformer (ViT-B/16) and Swin Transformer (Swin-Tiny) models—both initialized on the ImageNet dataset—to perform robust tumor classification and precise localization. The proposed dualarchitecture model integrates parallel processing, attention-guided tumor localization, and clinically relevant outputs including tumor size estimation and stage classification. To improve generalization and reduce over fitting, the system incorporates advanced data augmentation strategies (flipping, rotation, contrast adjustments) along with regularization techniques such as dropout and weight decay. Unlike traditional CNN-based or manual diagnostic approaches, our method generates interpretable visual outputs with circular overlays, heatmaps, and stage labels, thereby bridging the gap between model predictions and clinical interpretability. Experimental results demonstrate superior performance across all major metrics, with the Swin Transformer achieving a classification accuracy of 92.4% and localization accuracy of 92.4%, outperforming conventional CNN architectures and object detection models. The proposed framework also reduces false positives by 12.7% and maintains an average tumor localization error of ≤ 5 mm—substantially lower than existing benchmarks. These results position our model as a reliable and interpretable AI-assisted diagnostic tool, with strong potential to support radiologists in early detection and personalized treatment planning for breast cancer.
Keywords: Breast Cancer Detection, Mammogram Analysis, Swin Transformer, Tumor Localization, Tumor Stage Classification, Vision Transformer