high performance computing on graphics processing units: hgpu.org

Posts

Apr, 8

A Hardware Multithreaded SpMV Kernel for the Convey HC-2ex

Applications exhibiting irregular behavior through poor memory locality have been a constant challenge for high-performance computing. Architectures supporting hardware multithreading (e.g. Tera MTA and Cray XMT) have been shown to deliver superior performance on such applications by masking memory latency. FPGAs have outperformed traditional architectures on applications that exhibit very large spatial locality and where […]

CUDA

Apr, 8

Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability

Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that […]

OpenCL

Apr, 8

Development of methods for the processing of mining images using genetic algorithms

In this paper we describe the extension of system FOTOM capabilities with respect to segmentation of specific mining images. We focus on methods that are inherently resistant against noise present in experimental pit at VSB Technical University. Here, we describe procedures employing proven active contours and evolutionary algorithms for recognizing points of interest in the […]

CUDA

Apr, 8

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of […]

CUDA

Apr, 7

Atomic-free Irregular Computations on GPUs

Atomic instructions are a key ingredient of codes that operate on irregular data structures like trees and graphs. It is well known that atomics can be expensive, especially on massively parallel GPUs, and are often on the critical path of a program. In this paper, we present two high-level methods to eliminate atomics in irregular […]

CUDA

Apr, 7

Exploring complex quantum systems with a hybrid CPU-GPU computing platform

One of the most striking features of quantum mechanics is the exponential growth of resources, required to find the states of a composite system, with the size of the system. This also is the origin of the two main bottlenecks in numerical studies of complex quantum systems, that are (i) diagonalizations of big matrices and […]

CUDA

Apr, 7

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Multiplying large integers is an operation that has many applications in Computational Science. Many cryptographic algorithms require operations on very large subsets of the integer numbers. Using Fast Fourier Transforms (FFT) and Graphics Processing Unit (GPU), we can speed up integer multiplication and make an effective multiplication algorithm. CUDA technology used to perform FFT on […]

CUDA

Apr, 7

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpMM operation into three, […]

CUDA

Apr, 7

A new CUDA-based GPU implementation of the two-dimensional Athena code

We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central […]

CUDA

Apr, 6

23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013

CASCON 2013 is the 23rd annual international conference hosted by CAS Research, IBM Canada Software Lab. Using the motto, “Innovation that matters”, this conference provides an exciting forum for exchanging ideas and experience in the ever-expanding and critical fields of software engineering and computing. The theme of this year, “Ecosystem of Engagement”, highlights the confluence […]

Apr, 6

OpenCL C++

With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these models are low-level, even when considering them as systems programming models. For example, OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. Computer […]

OpenCL

Apr, 6

A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA

The Ability to harness the power of the Graphics Processor Unit (GPU) enables us to show dramatic increases in computing performance using a parallel computing platform and programming model such as Nvidia CUDA. Compute Unified Device Architecture (CUDA) is NVIDIAs graphics programming API to perform General Purpose Graphics Processing Unit Programming (GPGPU). The General Purpose […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A Hardware Multithreaded SpMV Kernel for the Convey HC-2ex

Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability

Development of methods for the processing of mining images using genetic algorithms

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

Atomic-free Irregular Computations on GPUs

Exploring complex quantum systems with a hybrid CPU-GPU computing platform

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

A new CUDA-based GPU implementation of the two-dimensional Athena code

23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013

OpenCL C++

A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)