Performance Analysis of Traditional Distance Metrics in Protein Structural Class Prediction

Abstract
Predicting a protein’s secondary structure directly from its amino acid sequence is a key challenge in bioinformatics. Successfully doing so has significant implications for understanding how proteins function and for designing new drugs. This study presents a comparative evaluation of seven distance and similarity measures—Euclidean, Manhattan, Minkowski, Cosine, Chebyshev, Mahalanobis, and Jaccard – for classifying proteins into four major secondary structural classes: α, β, α + β, and α/β. Using a curated dataset of 120 protein sequences represented by the frequency of 20 amino acids, each metric was employed in a minimum-distance-based classification framework. Group-wise frequency statistics, including mean, maximum, and minimum values, were analyzed to understand amino acid distribution across structural classes. A classification algorithm was then designed to compute distances between an unknown protein and each class group, identifying the closest match. Accuracy was measured by comparing predicted labels against true structural categories. The results show that the Mahalanobis distance achieved the highest mean classification accuracy (64.17%), closely followed by Cosine distance (61.67%), due to their ability to capture feature dependencies and directional similarity, respectively. Jaccard similarity performed poorly, indicating its inadequacy for continuous numerical data. The method yielded a maximum prediction accuracy of 79% for some cases. This comprehensive performance evaluation underscores the importance of selecting appropriate distance metrics for structural classification tasks and sets the foundation for future integration with ensemble or deep learning models.
Keywords: Amino Acid Frequency, Bioinformatics, Distance Metrics, Prediction Models, Protein Secondary Structure (PSS), Secondary Structural Classes (SSC).

Author(s): Shreya Saha, Papri Ghosh*, Debmitra Ghosh, Subhram Das, Md Ashifuddin Mondal, Dharmpal Singh
Volume: 6 Issue: 4 Pages: 1103-1116
DOI: https://doi.org/10.47857/irjms.2025.v06i04.06682