标题:Model-based Clustering of Short Text Streams
作者:Yin, Jianhua; Chao, Daren; Liu, Zhongkun; Zhang, Wei; Yu, Xiaohui; Wang, Jianyong
通讯作者:Zhang, W
作者机构:[Yin, Jianhua; Chao, Daren; Liu, Zhongkun; Yu, Xiaohui] Shandong Univ, Sch Comp Sci & Technol, Jinan, Shandong, Peoples R China.; [Zhang, Wei] East 更多
会议名称:24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
会议日期:AUG 19-23, 2018
来源:KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING
出版年:2018
页码:2634-2642
DOI:10.1145/3219819.3220094
关键词:Text Stream Clustering; Mixture Model; Dirichlet Process
摘要:Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.
收录类别:CPCI-S
资源类型:会议论文
TOP