Abstract
Student engagement is a key construct in learning achievement, especially in digital and technology-enhanced educational environments. However, multi-modal data including facial expressions, vocal prosody, physiological data, and interaction logs are increasingly available, yet existing systems rely on single-modal or homogeneous models, which impairs their prediction power and generalizability. To mitigate these drawbacks, we propose a Heterogeneous Multi- Model Ensemble Framework (HMMEF) incorporation of Convolutional, Recurrent, Support Vector Machine and Decision Tree for predicting and improving student engagement. Using pre-defined multimodal data sets, the framework utilizes dynamic adaptive weighting to determine model contributions through real-time data quality. Experiments on several educational datasets show that HMMEF achieves superior performance compared with classical single model classifiers with higher performance accuracy, better scalability, and interpretability. Furthermore, the tool detects actionable engagement patterns and recommendations for personalized learning interventions, offering a scalable and adaptable platform for intelligent tutoring systems and real-time multimodal analytics in education.
Keywords: Adaptive Weighting Framework, Ensemble Machine Learning, Intelligent Tutoring Systems, Multimodal Learning Analytics, Student Engagement Prediction.