high performance computing on graphics processing units: hgpu.org

Posts

Apr, 21

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and […]

CUDA

Apr, 21

A performance prediction model for the CUDA GPGPU platform

The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. Till recently, there has not been a performance model for GPGPUs. The absence of such a model makes it difficult […]

CUDA

Apr, 21

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently […]

CUDA

Apr, 20

Exploring scalability of FIR filter realizations on Graphics Processing Units

General-Purpose Computing on Graphics Processing Units (GPGPU) has lately been of great interest due to the release of architectures and software that simplifies programming graphics cards. This study explores how performance scales with FIR digital filters by varying the number of taps and the samples. We also discuss the trade-offs with various techniques for GPGPU […]

CUDA

Apr, 20

Stream processing of moment invariants for real-time classifiers

This paper introduces a general purpose graphics processing unit (GPGPU) stream processing implementation of moment invariants using an integral image or summed area table approach. Summed area tables have been used to help attain real-time performance for some classifier systems, however due to the computational complexity of moment invariants, a high throughput computational platform is […]

Apr, 20

Improving the performance of PIR Protocol in Outsourced Databases

Outsourcing database as service instead of using in-house database management is a new trend emerging in a computing industry; there has been growing interest in outsourcing database services in both the commercial world and the research community. In this paper, we present analysis of non-concurrent model of fast single-database Private Information Retrieval (PIR) scheme for […]

Apr, 20

A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis

Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the […]

CUDA

Apr, 20

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

In this paper, the mixed precision algorithm to solve the linear system of equations and the implementation of HPL package are introduced. We use this mixed precision algorithm to improve HPL package on CPU + GPGPU heterogeneous clusters, which is named for GHPL, and give the implementation mechanisms in detail. The experimental results are measured […]

CUDA

Apr, 20

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of timestepped simulations, little is known about the applicability of […]

Apr, 20

An efficient GPU implementation of the revised simplex method

The computational power provided by the massive parallelism of modern graphics processing units (GPUs) has moved increasingly into focus over the past few years. In particular, general purpose computing on GPUs (GPGPU) is attracting attention among researchers and practitioners alike. Yet GPGPU research is still in its infancy, and a major challenge is to rearrange […]

Apr, 20

Tutorial 3: Methodologies and Performance Impacts of General Purpose Computing on GPUs

Graphics Processing Units (GPUs) has been applied to graphics applications to implement realistic perspectives of virtual scenes especially in entertainment market. Due to the demands from the market for creating super high definition scenes with high frame rate that simulates physics phenomenon naturally in visualization applications, the last decade promoted drastic performance improvement of GPUs. […]

Apr, 20

Design and implementation of software-managed caches for multicores with local memory

Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies from different types of memory accesses add overhead and adversely affect instruction scheduling. Instead, the accelerator cores have internal local memory to place their code and data. Programmers of such […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

A performance prediction model for the CUDA GPGPU platform

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Exploring scalability of FIR filter realizations on Graphics Processing Units

Stream processing of moment invariants for real-time classifiers

Improving the performance of PIR Protocol in Outsourced Databases

A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

An efficient GPU implementation of the revised simplex method

Tutorial 3: Methodologies and Performance Impacts of General Purpose Computing on GPUs

Design and implementation of software-managed caches for multicores with local memory

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)