标题：HiPGA: A High Performance Genome Assembler for Short Read Sequence Data
作者：Duan, Xiaohui; Zhao, Kun; Liu, Weiguo
作者机构：[Duan, Xiaohui; Zhao, Kun; Liu, Weiguo] Shandong Univ, Engn Res Ctr Digital Media Technol, Sch Comp Sci & Technol, Minist Educ, Jinan 250101, Peoples 更多
会议名称：28th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
会议日期：MAY 19-23, 2014
来源：PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)
关键词：Genome Assembly; de Bruijn Graph; Short Read Data; MPI; Multi-threading
摘要：Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm; HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.