标题：Web page publication date extraction and application
作者：Chen, Zhumin ;Ma, Jun ;Rui, Hongxing ;Sun, Yuyin ;Shao, Haimin ;Ren, Zhaochun
作者机构：[Chen, Zhumin ;Ma, Jun ;Shao, Haimin ;Ren, Zhaochun ] School of Computer Science and Technology, Shandong University, Jinan 250101, China;[Sun, Yuyin 更多
来源：Journal of Computational Information Systems
关键词：Machine learning; Page rank; Publication date extraction; Temporal information extraction
摘要：Publication dates(P-dates for short) of Web pages are often required in many application areas. In this paper, we propose a supervised machine learning approach to find the P-dates, where the linguistic information and format information extracted from the DOM (Document Object Model) tree of Web pages are used as features for learning. Experiments demonstrate our approach significantly outperforms three baseline methods which utilize the first date, the last date, and the latest date respectively in terms of F1 score for both English and Chinese pages. As an application, we study how to add the P-dates in page rank. We propose a model for page rank, which takes the P-dates, the relevance scores between the text of pages and user query, and the important scores of pages into account. Experiments indicate that page rank using P-dates is almost always better than those do not in terms of NDCG(Normalized Discount Cumulative Gain). Copyright © 2010 Binary Information Press.