基于机器学习算法构建重复经颅磁刺激治疗脑卒中后上肢运动功能障碍的疗效预测模型

李亚娟; 王萍芝; 范莎莎; 罗艳虹

文章摘要

李亚娟,王萍芝,范莎莎,等.基于机器学习算法构建重复经颅磁刺激治疗脑卒中后上肢运动功能障碍的疗效预测模型[J].中华物理医学与康复杂志,2026,48(6):501-508

扫码阅读全文

基于机器学习算法构建重复经颅磁刺激治疗脑卒中后上肢运动功能障碍的疗效预测模型

DOI：10.3760/cma.j.cn421666-20250401-00287

中文关键词: 重复经颅磁刺激脑卒中上肢运动功能障碍机器学习类别不平衡沙普利可加性解释

英文关键词: Repetitive transcranial magnetic stimulation Stroke Upper limb motor dysfunction Machine learning Class imbalance Shapley additive explanations technique

基金项目:山西省重点研发计划项目(202302130501014)；山西省中医药管理局科研课题（2024ZYY2B036）

作者	单位
李亚娟	山西医科大学公共卫生学院卫生统计学教研室,煤炭环境致病与防治教育部重点实验室,重大疾病风险评估山西省重点实验室,晋中 030600 山西白求恩医院(山西医学科学院),山西医科大学第三医院,同济山西医院,太原 030032
王萍芝	山西白求恩医院(山西医学科学院),山西医科大学第三医院,同济山西医院,太原 030032
范莎莎	山西白求恩医院(山西医学科学院),山西医科大学第三医院,同济山西医院,太原 030032
罗艳虹	山西医科大学公共卫生学院卫生统计学教研室,煤炭环境致病与防治教育部重点实验室,重大疾病风险评估山西省重点实验室,晋中 030600

摘要点击次数: 146

全文下载次数: 129

中文摘要:

目的基于不同机器学习算法构建重复经颅磁刺激(rTMS)治疗脑卒中后上肢运动功能障碍的疗效预测模型。方法收集459例脑卒中后上肢运动功能障碍患者的临床资料。将数据集按7∶3比例分层随机划分为训练集和测试集。采用6种类别不平衡处理方法对训练集进行数据平衡，包括合成少数类过采样技术(SMOTE)、边界线1型合成少数类过采样技术(Borderline-1)、边界线2型合成少数类过采样技术(Borderline-2)、自适应合成采样法(ADASYN)、SMOTE-ENN混合采样法(SMOTEENN)和SMOTE-Tomek联合采样法(SMOTETomek)，测试集保持不变。构建11种分类算法模型，包括6种传统算法和5种提升算法，6种传统算法包括随机森林(RF)、支持向量机(SVM)、逻辑回归(LR)、K近邻(KNN)、朴素贝叶斯(NB)、决策树(DT)；5种提升算法包括自适应提升算法(AdaBoost)、梯度提升决策树(GBDT)、极端梯度提升(XGBoost)、类别特征提升(CatBoost)、轻量级梯度提升(LightGBM)。以受试者工作特征(ROC)曲线下面积(AUC)、准确率(Accuracy)、精确率(Precision)、召回率(Recall)和F1值作为评价指标，筛选出最优的预测模型，并应用沙普利可加性解释(SHAP)方法对模型进行可视化解释。结果基于Borderline-1类别不平衡处理和LightGBM分类算法构建的模型，测试集中，对rTMS治疗脑卒中后上肢运动功能障碍疗效的预测性能最优(AUC=0.85、Accuracy=0.85、Recall=0.83、Precision=0.86、F1=0.85)。结论基于Borderline-1数据平衡后构建的LightGBM模型表现出良好的分类性能，在此基础上引入SHAP方法，可显著增强模型的可解释性。该LightGBM模型可根据患者的自身因素和治疗方案预测rTMS治疗脑卒中后上肢运动功能障碍的疗效。

英文摘要:

Objective To develop a model predicting the effectiveness of repetitive transcranial magnetic stimulation (rTMS) in the treatment of post-stroke upper limb motor dysfunction. Methods Clinical data describing 459 stroke survivors with upper limb motor dysfunction were collected. The dataset was stratified and randomly divided into a training set and a test set at a 7∶3 ratio. The training set was given data balanced using synthetic minority over-sampling (SMOTE), borderline-SMOTE type 1 (Borderline-1), Borderline-SMOTE type 2 (Borderline-2), adaptive synthetic sampling (ADASYN), SMOTE combined with edited nearest neighbors (SMOTE-ENN), and SMOTE combined with Tomek links (SMOTE-Tomek). The test set was not intervened. Eleven classification models were then developed: six traditional algorithms and five boosting algorithms. The former 6 were random forest (RF), support vector machine (SVM), logistic regression (LR), K-nearest neighbors (KNN), naive Bayes (NB), and decision tree (DT). The boosters were adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM). The evaluation metrics used were the area under the receiver operating characteristics curve (AUC), accuracy, precision, recall, and an F1 score. The optimal predictive model was selected, and the Shapley additive explanations (SHAP) method was applied to provide a visual interpretation of that model. Results The model based on Borderline-1 sampling and the LightGBM algorithm achieved the best performance in the test set for the efficacy of rTMS in treating post-stroke upper limb motor dysfunction (AUC=0.85, accuracy=0.85, recall=0.83, precision=0.86, F1=0.85). Conclusions The LightGBM model developed in this study after balancing the data with Borderline-1 exhibited strong classification performance. Shapley additive explanations can improve its interpretability. This model enables predicting the efficacy of using rTMS in treating stroke survivors with upper limb motor dysfunction based on the patient′s clinical characteristics and treatment-related factors.

查看全文查看/发表评论下载PDF阅读器

关闭