Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

hgpu.org » Programming » Algorithms » Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

Sampath Kumar, P. K. Baruah

Department of Mathematics and Computer Science, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, A.P. -515134, India

International Journal of Computer Applications, Volume 80 – No 12, 2013

DOI:10.5120/13910-1121

@article{kumarnvssp2013communication,

title={Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm},

author={Kumarnvssp, Sampath and K Baruah, P},

journal={International Journal of Computer Applications},

volume={80},

number={12},

pages={1–7},

year={2013}

}

Download (PDF)

View

Source

2177

views

GPU parallelism for real applications can achieve enormous performance gain. CPU-GPU Communication is one of the major bottlenecks that limit this performance gain. Among several libraries developed so far to optimize this communication, DyManD (Dynamically Managed Data) provides better communication optimization strategies and achieves better performance on a single GPU. Smith-Waterman is a well known algorithm in the field of computational biology for finding functional similarities in a protein database. CUDA implementation of this algorithm speeds up the process of sequence matching in the protein database. When input databases are large, multi-GPU implementation gives better performance than single GPU implementation. Since this algorithm acts upon large databases, there is need for optimizing CPU-GPU communication. DyManD implementation provides efficient data management and communication optimization only for single GPU. For providing communication optimization on multiple GPUs, an approach of combining DyManD with a multi-threaded framework called GPUWorker was proposed. Our contribution in this work is to propose an optimized CUDA implementation of this algorithm on multiple GPUs i.e., GPUWorker-DyManD which reduces the communication overhead between CPU and multiple GPUs. This implementation combines DyManD functionality with GPUWorker for optimizing communication. The performance gain obtained for the GPUWorker-DyManD implementation of this algorithm over default multi-GPU implementation is 3.5x.

Tags: Algorithms, Biology, Computational biology, CUDA, Databases, nVidia, Sequence matching, Smith-Waterman algorithm, Tesla M2050

November 2, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org