The Multimodal Fusion of Voice, Gait, and Handwriting Detection of Parkinson’s Disease Using Machine Learning

Abstract
Parkinson’s disease (PD) is an advanced neurological disorder that impacting movement control due to the degeneration of neurons that synthesize dopamine. The subtle nature of early symptoms such as shaking, bradykinesia, and speech changes often complicates timely diagnosis, reducing opportunities for early intervention. This study proposes a multimodal machine learning model for the early diagnosis of PD by integrating three complementary data modalities: voice recordings, gait analysis, and handwriting patterns. For each modality, specialized neural networks are deployed to extract critical features, including acoustic markers, motion irregularities, and fine motor dynamics. A self-supervised learning (SSL) paradigm is employed to enhance feature representation without reliance on large-scale labeled datasets, thereby addressing data scarcity challenges. These modality specific features are fused through a Multimodal Transformer model with cross-attention mechanisms, enabling the system to capture complex interdependencies and improve diagnostic accuracy. Evaluation on a cohort of 1,000 subjects (70% with early-stage PD and 30% healthy controls) achieved 96.5% accuracy, surpassing benchmark methods. The results highlight the potential of multi-modal integration and SSL for advancing earlier and more reliable PD detection, offering a promising pathway toward improved clinical outcomes.
Keywords: Cross-Attention Mechanism, Early Detection, Gait Analysis, Handwriting Analysis, Multimodal Transformer, Neural Networks, Parkinson’s Disease, Self-Supervised Learning, Voice Analysis.

Author(s): Mirle Bhyraj Meghashree, Karigowda Dhananjaya Kumar*, Nagaraju Vinutha, Dinesh Akash
Volume: 6 Issue: 4 Pages: 1062-1074
DOI: https://doi.org/10.47857/irjms.2025.v06i04.06552