high performance computing on graphics processing units: hgpu.org

Posts

Mar, 12

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

This paper describes and evaluates a highly-scalable framework for running iterative local searches on heterogeneous HPC platforms. The user only needs to provide serial CPU or single-GPU code that implements a simple interface. The framework then executes this code in parallel using MPI between compute nodes and OpenMP and multi-GPU support within nodes. It handles […]

CUDA

Mar, 12

3D Modeling, Distance and Gradient Computation for Motion Planning: A Direct GPGPU Approach

The Kinect sensor and KinectFusion algorithm have revolutionized environment modeling. We bring these advances to optimization-based motion planning by computing the obstacle and self-collision avoidance objective functions and their gradients directly from the KinectFusion model on the GPU without ever transferring any model to the CPU. Based on this, we implement a proof-of-concept motion planner […]

CUDA

Mar, 12

Performance Traps in OpenCL for CPUs

With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also on CPUs. Whether porting GPU programs to CPUs, or simply writing new code for CPUs, using OpenCL brings up the performance issue, usually raised in one of two forms: "OpenCL is not […]

OpenCL

Mar, 12

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into a high performance computing library. With the multiplication of cores on a single chip, answering architectural alignment requirements became an […]

CUDA

Mar, 12

Morph Algorithms on GPUs

There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first search, computing page-ranks, and finding shortest paths. However, these algorithms do not modify the graph structure, so their implementation is relatively easy compared to general graph algorithms like mesh generation and refinement, which morph the underlying graph in non-trivial ways by […]

CUDA

Mar, 9

Speeding Up Model Building for ECGA on CUDA Platform

Parallelization is a straightforward approach to enhance the efficiency for evolutionary computation due to its inherently parallel nature. Since NVIDIA released the compute unified device architecture (CUDA), graphic processing units have enabled lots of scalable parallel programs in a wide range of fields. However, parallelization of model building for EDAs is rarely studied. In this […]

CUDA

Mar, 9

Signal Processing and General Purpose Computing on GPU

Graphics processing units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA’s CUDA interface, are becoming the platform of choice in the scientific computing community. Today the research community successfully uses GPU to solve a broad range of computationally demanding, complex problems. This […]

CUDA

Mar, 9

Detecting Computer Viruses using GPUs

Anti-virus software is the main defense mechanism against malware, which is becoming more common and advanced. A significant part of the virus scanning process is dedicated to scanning a given file against a set of virus signatures. As it is important that the overall scanning process be as fast as possible, efforts must be done […]

CUDA

Mar, 9

CU2rCU: A CUDA-to-rCUDA Converter

GPUs (Graphics Processor Units) are being increasingly embraced by the high performance computing and computational communities as an effective way of considerably reducing application execution time by accelerating significant parts of their codes. CUDA (Compute Unified Device Architecture) is a new technology developed by NVIDIA which leverages the parallel compute engine in GPUs. However, the […]

CUDA

Mar, 9

GPU accelerated maximum cardinality matching algorithms for bipartite graphs

We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other areas. To the best of our knowledge, ours is the first study which focuses on GPU implementation of the maximum cardinality matching algorithms. We […]

CUDA

Mar, 7

Solutions For Optimizing The Radix Sort Algorithmic Function Using The Compute Unified Device Architecture

In this paper, we have researched and developed solutions for optimizing the radix sort algorithmic function using the Compute Unified Device Architecture (CUDA). The radix sort is a common parallel primitive, an essential building block for many data processing algorithms, whose optimization improves the performance of a wide class of parallel algorithms useful in data […]

CUDA

Mar, 7

GPU based Eulerian Assembly of Genomes

Advances in sequencing technologies have revolutionized the field of genomics by providing cost effective and high throughput solutions. In this paper, we develop a parallel sequence assembler implemented on general purpose graphic processor units (GPUs). Our work was largely motivated by a growing need in the genomic community for sequence assemblers and increasing use of […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

3D Modeling, Distance and Gradient Computation for Motion Planning: A Direct GPGPU Approach

Performance Traps in OpenCL for CPUs

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Morph Algorithms on GPUs

Speeding Up Model Building for ECGA on CUDA Platform

Signal Processing and General Purpose Computing on GPU

Detecting Computer Viruses using GPUs

CU2rCU: A CUDA-to-rCUDA Converter

GPU accelerated maximum cardinality matching algorithms for bipartite graphs

Solutions For Optimizing The Radix Sort Algorithmic Function Using The Compute Unified Device Architecture

GPU based Eulerian Assembly of Genomes

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)