标题：Fast Fine-Grained Air Quality Index Level Prediction Using Random Forest Algorithm on Cluster Computing of Spark
作者：Zhang, Chuanting; Yuan, Dongfeng
作者机构：[Zhang, Chuanting; Yuan, Dongfeng] Shandong Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China.
会议名称：12th IEEE Int Conf Ubiquitous Intelligence & Comp/12th IEEE Int Conf Autonom & Trusted Comp/15th IEEE Int Conf Scalable Comp & Commun & Associated Workshops/IEEE Int Conf Cloud & Big Data Comp/IEEE Int Conf Internet People
会议日期：AUG 10-14, 2015
来源：IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS
关键词：Big data; air quality prediction; spark; random forest
摘要：As particulate materials in the air can cause several kinds of respiratory and cardiovascular diseases, the air quality information predicting attracts more and more attention. Knowing these information in advance is very important to protect human from health problems. With the development of computer technology, the data we can collect is increasingly becoming fine-grained. Most important of all, they need to be analyzed in real-time. However, existing methods could not meet the demand of real-time analysis. In this paper, we predict air quality based on a Spark implementation of random forest algorithm. First, a distributed random forest algorithm is implemented using Spark on the basis of resilient distributed dataset and shared variable. Then, we build an air quality prediction model using the parallelized random forest algorithm. The proposed method is evaluated with real meteorology data obtained from Beijing. The experiment results show that the proposed method is fast in predicting concentration level of PM2.5. And the results also prove the effectiveness and scalability of our method when deal with big data.