A Tree-based Machine Learning Approach for Precise Renal Cell Carcinoma Subtyping using RNA-seq Gene Expression Data

Auteurs-es

  • Oluwafemi Ogundare Department of Medicine & Surgery, Faculty of Clinical Sciences, College of Medicine, University of Ibadan, Oyo State, Nigeria

DOI :

https://doi.org/10.12856/JHIA-2025-v12-i1-533

Résumé

Background and Purpose: Renal cell carcinoma (RCC) is a malignant neoplasm of the kidneys, characterized by distinct molecular and histological subtypes. Accurate subtyping is crucial for personalized treatment and improved patient outcomes. High-throughput sequencing has enabled precise gene expression profiling for cancer classification. This study compares tree-based and non-tree-based machine learning algorithms for differentiating between gene expression profiles of chromophobe, clear cell, and papillary RCCs.

Methods: RNA-seq data from a diverse cohort of patients diagnosed with these three cancer subtypes was used. Data preprocessing and normalization were performed, followed by feature selection using Analysis of Variance (ANOVA). Tree-based and non-tree-based algorithms were trained on the preprocessed data. The tree-based algorithms included decision tree, random forest, extra trees classifier, and bagging classifier. The non-tree-based algorithms included logistic regression, support vector machine, and naive bayes. Each algorithm was evaluated using sensitivity, specificity, F1 score, and AUC.

Results: Tree-based algorithms demonstrated superior performance across all evaluation metrics compared to non-tree-based algorithms. Specifically, the random forest classifier achieved the highest specificity and F1 score, the decision tree classifier achieved the highest sensitivity, while the bagging classifier achieved the highest AUC score. In contrast, non-tree-based algorithms showed comparatively lower performance in distinguishing between the cancer subtypes.

Conclusions: This study demonstrates the potential of machine learning, particularly tree-based models, for precise RCC subtyping. By leveraging tree-based models, we can effectively capture the complex, non-linear patterns in gene expression datasets. Future studies should aim to validate these findings across larger and more diverse datasets of RCC subtypes.

Téléchargements

Les données de téléchargement ne sont pas encore disponible.

Téléchargements

Publié

2025-06-17

Numéro

Rubrique

Article de Recherche

Comment citer

[1]
Ogundare, O. 2025. A Tree-based Machine Learning Approach for Precise Renal Cell Carcinoma Subtyping using RNA-seq Gene Expression Data. Journal of Health Informatics in Africa. 12, 1 (juin 2025), 39–52. DOI:https://doi.org/10.12856/JHIA-2025-v12-i1-533.