标题：HiPGA: A high performance genome assembler for short read sequence data
作者：Duan, Xiaohui ;Zhao, Kun ;Liu, Weiguo
作者机构：[Duan, Xiaohui ;Zhao, Kun ;Liu, Weiguo ] School of Computer Science and Technology, Engineering Research Center of Digital Media Technology, Shandong 更多
会议名称：28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
会议日期：May 19, 2014 - May 23, 2014
来源：Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
摘要：Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.
© 2014 IEEE.