标题：The Identification of Web Forum Speech Leaders Based on Statistical Analysis and Text Classification
作者：Duan, Zhenchen; Liu, Mengyi; Fan, Rongchao; Yuan, Meng
作者机构：[Duan, Zhenchen; Yuan, Meng] Shandong Univ, Sch Math, Jinan 250100, Shandong, Peoples R China.
会议名称：International Conference on Management Science and Engineering
会议日期：OCT 17-18, 2010
来源：2010 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (MSE 2010), VOL 5
关键词：text classification; mechanical segmentation; KNN algorithm;; mathematical model; Kolmogorov-Smirnov testing; hypotheses
摘要：This paper introduces a theory of web information analysis based on statistics and pattern recognition. We extract the data on the web forum by "robot reptile". The Maximum Matching Method and KNN algorithm are used in the automatic text classification to divide valid and invalid information; Then we transform the preprocessed data into mathematical model and analyse the model by using hypotheses testing and Kolmogorov-Smirnov testing. The recognition of the speech leaders is completed after comparison between the results of the model analysis and the proposed evaluation criteria. At the end of the paper, a experimental system on a real web forum shows the effectiveness of this method.