标题：A Machine Learning Approach to Identify DNA Replication Proteins from Sequence-Derived Features
作者：Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina
作者机构：[Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China.; [Zhang, Chengjin] Shan 更多
会议名称：IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE)
会议日期：MAY 03-06, 2015
来源：2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE)
摘要：DNA replication, a critical step in cell division and proliferation, is a process of producing two identical replicas from one original DNA molecule. Although great advances have been made in DNA replication research, the detailed mechanism of DNA replication is still unresolved. Faithful DNA replication requires the cooperation of many proteins. Failures in DNA replication leave mutations in the genome, which can cause cancers and other diseases. Therefore, accurately identifying these important DNA replication proteins may assist in understanding the molecular mechanisms of DNA replication and drug development. As the experimental methods are expensive and labor intensive, it is highly desired to develop an accurate computational method for identifying DNA replication proteins. In this paper, a machine learning approach to identify DNA replication proteins has been developed using a Naive Bayes classifier and sequence-derived features. The prediction performance of features extracted from the Reduced Amino Acid Composition (RAAC) and two Pseudo Amino Acid Composition (PseAAC) models is investigated, respectively. Prediction results indicate that the PseAAC (type 2) model yields the best performance. Then, based on the PseAAC (type 2) model, we compare our method with the similarity search method on the independent test dataset. The comparison results reveal that it is feasible to identify DNA replication proteins by machine learning algorithms. The proposed method may provide candidate DNA replication proteins for future experimental verification to assist in understanding the molecular mechanisms of DNA replication and drug development for the treatment of human diseases.