high performance computing on graphics processing units: hgpu.org

Posts

Mar, 4

Scene Recognition Acceleration Using CUDA and OpenMP

Scene recognition has become a remarkable field in image processing area, and many methods have been proposed in recent years, in which the idea of extracting the scene gist from global features has been proved to have higher retrieval accuracy compared with many other methods. However, the process of extracting gist is heavily time-consuming and […]

CUDA

Mar, 4

Towards a Software Transactional Memory for Graphics Processors

The introduction of general purpose computing on many-core graphics processor systems, and the general shift in the industry towards parallelism, has created a demand for ease of parallelization. Software transactional memory (STM) simplifies development of concurrent code by allowing the programmer to mark sections of code to be executed concurrently and atomically in an optimistic […]

CUDA

Mar, 4

Some of the What?, Why?, How?, Who? and Where? of Graphics Processing Unit Computing for Bayesian Analysis

Over the last 20 years or so, a number of Bayesian researchers and groups have invested a good deal of time, effort and money in parallel computing for Bayesian analysis. The growth of “small research group” to “institutionally supported” cluster computational facilities has had a substantial impact on a number of areas of Bayesian analysis, […]

CUDA

Mar, 4

Acceleration of Medical Image Registration using Graphics Process Units in Computing Normalized Mutual Information

This paper presents a computational performance analysis of an accelerated medical image registration using Graphics Processing Units (GPUs). In our previous work, a multi-resolution approach using normalized mutual information (NMI) has proven to be useful in medical image registration. In this paper, we propose an acceleration of the NMI procedure using GPU implementation because of […]

CUDA

Mar, 4

Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures

We describe advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via GPU (graphics processing unit) programming. The developments are partly motivated by computational challenges arising in increasingly prevalent biological studies using high-throughput flow cytometry methods, generating many, very large data sets and requiring increasingly high-dimensional mixture models with large numbers […]

CUDA

Mar, 4

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Optimizing program execution targeted for Graphics Processing Units (GPUs) can be very challenging. Our ability to efficiently map serial code to a GPU or stream processing platform is a time consuming task and is greatly hampered by a lack of detail about the underlying hardware. Programmers are left to attempt trial and error to produce […]

Mar, 4

Redesigning combustion modeling algorithms for the Graphics Processing Unit (GPU): Chemical kinetic rate evaluation and ordinary differential equation integration

Detailed modeling of complex combustion kinetics remains challenging and often intractable, due to prohibitive computational costs incurred when solving the associated large kinetic mechanisms. The Graphics Processing Unit (GPU), originally designed for graphics rendering on computer and gaming systems, has recently emerged as a powerful, cost-effective supplement to the Central Processing Unit (CPU) for dramatically […]

CUDA

Mar, 4

Multi-GPU Performance of Incompressible Flow Computation by Lattice Boltzmann Method on GPU Cluster

GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of Lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on the multi-node GPU cluster TSUBAME of Tokyo Institute of technology, in which total 680 GPUs of NVIDIA Tesla […]

CUDA

Mar, 3

Real-Time Multiprocessor Systems with GPUs

Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to the CPU. The performance advantages of GPUs can be great, often outperforming traditional CPUs by […]

CUDA

Mar, 3

Smooth Mixed-Resolution GPU Volume Rendering

We propose a mixed-resolution volume ray-casting approach that enables more flexibility in the choice of downsampling positions and filter kernels, allows freely mixing volume bricks of different resolutions during rendering, and does not require modifying the original sample values. A C^0-continuous function is obtained everywhere with hardware-native filtering at full speed by simply warping texture […]

OpenGL

Mar, 3

The sparse matrix vector product on GPUs

The sparse matrix vector product (SpMV) is a paramount operation in engineering and scientific computing and, hence, has been a subject of intense research for long. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim […]

CUDA

Mar, 3

Unified – A Sharp Turn in the Latest Era of Graphic Processors

The need of high performance and realism has increased a lot in the last few decades, especially in gaming, 3D graphics and computationally demanding applications. It has compelled the GPU vendors to put their best effort towards the improvement of ILP (Instruction Level Parallelism). As a result of which, the GPU has entered in a […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Scene Recognition Acceleration Using CUDA and OpenMP

Towards a Software Transactional Memory for Graphics Processors

Some of the What?, Why?, How?, Who? and Where? of Graphics Processing Unit Computing for Bayesian Analysis

Acceleration of Medical Image Registration using Graphics Process Units in Computing Normalized Mutual Information

Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Redesigning combustion modeling algorithms for the Graphics Processing Unit (GPU): Chemical kinetic rate evaluation and ordinary differential equation integration

Multi-GPU Performance of Incompressible Flow Computation by Lattice Boltzmann Method on GPU Cluster

Real-Time Multiprocessor Systems with GPUs

Smooth Mixed-Resolution GPU Volume Rendering

The sparse matrix vector product on GPUs

Unified – A Sharp Turn in the Latest Era of Graphic Processors

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)