标题：Chinese word segmentation of ideological and political education based on unsupervised learning
作者：Zang, Wen-Jing ;Liu, Zi-Zhao ;Yang, Xing-Hai ;Zhang, Yu-Lin
作者机构：[Zang, Wen-Jing ;Liu, Zi-Zhao ;Zhang, Yu-Lin ] School of Information Science and Engineering, University of Jinan, Jinan; 250022, China;[Yang, Xing-Ha 更多
会议名称：2nd International Conference on Big Data Technologies, ICBDT 2019
会议日期：August 28, 2019 - August 30, 2019
来源：ACM International Conference Proceeding Series
摘要：This paper proposes an unsupervised Chinese word segmentation algorithm for ideological and political education. The algorithm is divided into two parts: language model generation algorithm and the Viterbi algorithm. The language model generation algorithm calculates the conditional probability based on the big texts and determines the number of occurrences between single character and character. Then we can have a character-level N-gram language model. Viterbi algorithm uses the idea of dynamic programming. Viterbi algorithm can use character-level language model to find the optimal word segmentation path. Finally complete the task of Chinese word segmentation supported by big texts. Experiments show that the proposed algorithm has a good recognition rate for vocabulary in the field of ideological and political education. With the characteristics of unsupervised learning, the algorithm can save a lot of labor costs and meet the needs of word segmentation in the field of ideological and political education.
© 2019 Association for Computing Machinery.