high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Alejandro Lopez-Ortiz, Alejandro Salinger, Robert Suderman

David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave West, Waterloo, ON, N2L 3G1, Canada

International Journal of Networking and Computing, Vol 4, No 1, pages 131-150, 2014

BibTeX

Download (PDF)

View

Source

2310

views

In the last few years, the development of programming languages for general purpose computing on Graphic Processing Units (GPUs) has led to the design and implementation of fast parallel algorithms for this architecture for a large spectrum of applications. Given the streaming-processing characteristics of GPUs, most practical applications consist of tasks that admit highly data-parallel algorithms. Many problems, however, allow for task-parallel solutions or a combination of task and data-parallel algorithms. For these, a hybrid CPU-GPU parallel algorithm that combines the highly parallel stream-processing power of GPUs with the higher scalar power of multi-cores is likely to be superior. In this paper we describe a generic translation of any recursive sequential implementation of a divide-and-conquer algorithm into an implementation that benefits from running in parallel in both multi-cores and GPUs. This translation is generic in the sense that it requires little knowledge of the particular algorithm. We then present a schedule and work division scheme that adapts to the characteristics of each algorithm and the underlying architecture, efficiently balancing the workload between GPU and CPU. Our experiments show a 4.5x speedup over a single core recursive implementation, while demonstrating the accuracy and practicality of the approach.

Tags: Algorithms, ATI, ATI Radeon HD 5970, ATI Radeon HD 6530, Computer science, Heterogeneous systems, Hybrid computing, OpenCL

January 11, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)