Metaheuristic Optimization of Random Forest for Lung Cancer Prediction

Abstract
Lung cancer remains one of the leading causes of cancer-related mortality worldwide, where early risk identification is critical for improving survival outcomes. While existing machine learning approaches for lung cancer prediction frequently rely on medical imaging, such methods are costly and often impractical in low-resource clinical settings. This study proposes an efficient and interpretable lung cancer risk prediction framework using demographic, lifestyle, and symptom-based data. A Genetic Algorithm (GA) is employed as a metaheuristic optimization strategy to jointly perform feature selection and hyperparameter tuning of a Random Forest (RF) classifier. To address the inherent class imbalance in the dataset, the Synthetic Minority Oversampling Technique (SMOTE) is applied exclusively to the training data to prevent information leakage and enhance minority-class learning. The proposed GA-optimized RF model is evaluated against several baseline classifiers, including Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Decision Tree, standard Random Forest, and XGBoost, using accuracy, precision, recall, F1-score, and ROCAUC as evaluation metrics. Experimental results demonstrate that the optimized RF model achieves superior performance, with an accuracy of 90.6%, an F1-score of 0.885, and a ROC-AUC of 0.917, outperforming all baseline models. Feature importance analysis identifies smoking habit, breathing difficulties, and throat discomfort as the most influential predictors, aligning with established clinical knowledge. The findings highlight that a metaheuristic-driven optimization approach applied to non-imaging data can provide a cost-effective, reliable, and interpretable solution for early lung cancer risk screening, particularly in resource-constrained healthcare environments.
Keywords: Demographic Data, Feature Selection, Genetic Algorithm, Lung Cancer Prediction, Metaheuristic Optimization, Random Forest.

Author(s): Balaji T*, Babu P, Lokeshwaran K
Volume: 7 Issue: 1 Pages: 1666-1678
DOI: https://doi.org/10.47857/irjms.2026.v07i01.08680