A new model for hospitalization expenses of Gastric cancer based on clustering and support vector machine

DOI编号  10.7641/CTA.2017.60545
2017,34(6):803-810

 作者 单位 E-mail 周涛 宁夏医科大学 理学院 zhoutaonxmu@126.com 陆惠玲 宁夏医科大学 理学院 王文文 宁夏医科大学 理学院 王惠群 宁夏医科大学 理学院

针对胃癌患者住院费用分类标签设定的复杂性以及传统费用建模算法的局限性, 本文提出了一种基于聚类和支持向量机的住院费用建模算法, 为胃癌患者住院费用的控制和预测提供方法基础. 搜集整理宁夏某三甲医 院2009–2011年间1583例胃癌患者为样本, 采用K-means对总住院费用逐年聚类得到分类标签, 最后通过支持向量机对住院费用进行建模预测以及影响因素分析, 用分类准确率作为预测效果的评价指标. 实验结果表明胃癌患者住院费用呈逐年增加趋势, 其中以西药费为主, 占总费用的53.74%. 通过K-Means以年份对费用聚类比单纯以费用分布特征聚类的分类准确率提高了13.13%, 当核函数选用高斯核函数, 且惩罚因子C = 10和核参数 = 1时建立的支持向量机模型最稳定, 分类准确率为92.11%. 实验结果表明根据年份聚类得到类别标签更合理, 结合聚类的SVM来预测住院费用更有效.

A new modeling method based on clustering and support vector machine (SVM) is proposed to simplify category labels complexity for the hospitalization expenses of gastric cancer patients and overcome the limitation of traditional cost modeling techniques, thereby providing some theoretical evidence to control and predict hospitalization expenses of gastric cancer patients. 1583 cases of gastric cancer patients in a certain tertiary general hospital of Ningxia from 2009 to 2011 were collected as samples. Total hospitalization expenses were clustered by years using K-means to obtain category labels, SVM was used to forecast and analyze the influencing factors of hospitalization expenses. The classification accuracy was used as indexes to evaluate the predicting effect. The experiment result show that hospitalization expenses of gastric cancer patients were increased year by year, and western drugs accounted for most of the hospital expenses(53.74%). The influencing factors of the cost of hospitalization were treatment outcome, surgery, admission situation, hospitalization time, ages and marital status, in which prognosis and surgery were the most important influences. The experimental results showed that the clustering accuracy of K-means by year was increased by 13.13% compared to only by distribution characteristics. The gauss kernel function-based SVM was the most stable model, with a classification accuracy rate of 92.11% when the penalty factor C and parameter were set to be 10 and 1, respectively. The method clustered by year was more reasonable to get category labels, and it was effective to combine clustering and SVM to forecast the hospitalization expenses.