high performance computing on graphics processing units: hgpu.org

Posts

Apr, 8

Development of methods for the processing of mining images using genetic algorithms

In this paper we describe the extension of system FOTOM capabilities with respect to segmentation of specific mining images. We focus on methods that are inherently resistant against noise present in experimental pit at VSB Technical University. Here, we describe procedures employing proven active contours and evolutionary algorithms for recognizing points of interest in the […]

CUDA

Apr, 8

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of […]

CUDA

Apr, 7

Atomic-free Irregular Computations on GPUs

Atomic instructions are a key ingredient of codes that operate on irregular data structures like trees and graphs. It is well known that atomics can be expensive, especially on massively parallel GPUs, and are often on the critical path of a program. In this paper, we present two high-level methods to eliminate atomics in irregular […]

CUDA

Apr, 7

Exploring complex quantum systems with a hybrid CPU-GPU computing platform

One of the most striking features of quantum mechanics is the exponential growth of resources, required to find the states of a composite system, with the size of the system. This also is the origin of the two main bottlenecks in numerical studies of complex quantum systems, that are (i) diagonalizations of big matrices and […]

CUDA

Apr, 7

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Multiplying large integers is an operation that has many applications in Computational Science. Many cryptographic algorithms require operations on very large subsets of the integer numbers. Using Fast Fourier Transforms (FFT) and Graphics Processing Unit (GPU), we can speed up integer multiplication and make an effective multiplication algorithm. CUDA technology used to perform FFT on […]

CUDA

Apr, 7

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpMM operation into three, […]

CUDA

Apr, 7

A new CUDA-based GPU implementation of the two-dimensional Athena code

We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central […]

CUDA

Apr, 6

23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013

CASCON 2013 is the 23rd annual international conference hosted by CAS Research, IBM Canada Software Lab. Using the motto, “Innovation that matters”, this conference provides an exciting forum for exchanging ideas and experience in the ever-expanding and critical fields of software engineering and computing. The theme of this year, “Ecosystem of Engagement”, highlights the confluence […]

Apr, 6

OpenCL C++

With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these models are low-level, even when considering them as systems programming models. For example, OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. Computer […]

OpenCL

Apr, 6

A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA

The Ability to harness the power of the Graphics Processor Unit (GPU) enables us to show dramatic increases in computing performance using a parallel computing platform and programming model such as Nvidia CUDA. Compute Unified Device Architecture (CUDA) is NVIDIAs graphics programming API to perform General Purpose Graphics Processing Unit Programming (GPGPU). The General Purpose […]

CUDA

Apr, 6

Improving GPU Performance Prediction with Data Transfer Modeling

Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution […]

CUDA

Apr, 6

Real-Time Object-Space Edge Detection using OpenCL

At its most basic, object-space edge detection iterates through all polygonal edges in each mesh to find those edges that satisfy one or more edge tests. Those that do are expanded and rendered, while the remainder are ignored. These 3D edges, and their resulting accuracy and customizability, set objectspace methods apart from all other categories […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Development of methods for the processing of mining images using genetic algorithms

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

Atomic-free Irregular Computations on GPUs

Exploring complex quantum systems with a hybrid CPU-GPU computing platform

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

A new CUDA-based GPU implementation of the two-dimensional Athena code

23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013

OpenCL C++

A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA

Improving GPU Performance Prediction with Data Transfer Modeling

Real-Time Object-Space Edge Detection using OpenCL

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)