Objective To investigate the potential applications of data mining methods in the diagnosis of digestive tract cancer (DTC) using several tumor markers(STM), and to compare the diagnostic performance for DTC with several methods of Logistic regression model, neural network, Bayesian classifier, and clinical diagnosis using a single STM and the combination of STMs. Methods Serum levels of CA19-9 , CA242 ,CA50 and CEA in 301 patients with DTC and 114 persons with benign digestive disease were used to build diagnostic classifiers based on three data mining methods, including Logistic regression, BP based neural network and Bayesian network. Ten-fold cross validation was employed to test these classifiers. The diagnostic performance was assessed and compared on the basis of sensitivity, specificity and receiver operating characteristic (ROC) curve. Results Sensitivity and the area under the ROC curve (Az) of BP neural network were 92.0% and 0.903, which were greater than the sensitivity of STM parallel diagnosis (83.4%, P<0.001) and Az value of CA19-9 (0.806, P<0.001), respectively, while the specificity (69.3%) was similar with that of STM parallel diagnosis (68.4%, P=1.00). Logistic regression model had a higher sensitivity of 91.4% than that of STM parallel diagnosis (P<0.001), a lower specificity of 45.6% than that of STM parallel diagnosis (P<0.001), and an similar Az value of 0.819 with that of STM parallel diagnosis (P=0.55). The sensitivity of Bayesian classifier was 72.8%, which was less than that of STM parallel diagnosis (P<0.001), and the specificity (75.4%) and the Az (0.797) were similar with those of STM parallel diagnosis and CA19-9 (P=0.13 and P=0.61), respectively. Conclusions BP neural network had higher diagnostic accuracy than the parallel diagnosis of the four tumor markers. Logistic regression and Bayesian network had equivalent diagnostic level to the parallel diagnosis of the four tumor markers, and BP neural network has higher diagnostic performance than the other two classifiers.
|