标题：GCMR: A GPU Cluster-based MapReduce Framework for Large-scale Data Processing
作者：Guo, Yiru; Liu, Weiguo; Gong, Bin; Voss, Gerrit; Mueller-Wittig, Wolfgang
作者机构：[Guo, Yiru; Liu, Weiguo; Gong, Bin] Shandong Univ, Sch Comp Sci & Technol, Jinan 250100, Peoples R China.; [Voss, Gerrit; Mueller-Wittig, Wolfgang] 更多
会议名称：15th IEEE International Conference on High Performance Computing and Communications (HPCC) /11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC)
会议日期：NOV 13-15, 2013
来源：2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC)
关键词：MapReduce; CUDA; MPI; GPU Cluster
摘要：MapReduce is a very popular programming model to support parallel and distributed large-scale data processing. There have been a lot of efforts to implement this model on commodity GPU-based systems. However, most of these implementations can only work on a single GPU. And they can not be used to process large-scale datasets. In this paper, we present a new approach to design the MapReduce framework on GPU clusters for handling large-scale data processing. We have used Compute Unified Device Architectures (CUDA) and MPI parallel programming models to implement this framework. To derive an efficient mapping onto GPU clusters, we introduce a two-level parallelization approach: the inter node level and intra node level parallelization. Furthermore in order to improve the overall MapReduce efficiency, a multi-threading scheme is used to overlap the communication and computation on a multi-GPU node. Compared to previous GPU-based MapReduce implementations, our implementation, called GCMR, achieves speedups up to 2.6 on a single node and up to 9.1 on 4 nodes of a Tesla S1060 quad-GPU cluster system for processing small datasets. It also shows very good scalability for processing large-scale datasets on the cluster system.