high performance computing on graphics processing units: hgpu.org

Posts

Mar, 5

High-performance GPU based Rendering for Real-Time, rigid 2D/3D-Image Registration in Radiation Oncology

This thesis presents a comparison of high-speed rendering algorithms for the application in 2D/3D-image registration in radiation oncology. Image guided radiation therapy (IGRT) is a technique for improving the treatment of cancer with ionizing radiation by adapting the treatment plan to the current situation using 2D/3D-image registration. To accelerate this procedure, also rendering of Digitally […]

OpenGL

Mar, 5

Phase Based Volume Registration on the GPU with Application to Quantitative MRI

We present a method for fast phase based registration of volume data for medical applications. As the number of different modalities within medical imaging increases, it becomes more and more important with registration that works for a mixture of modalities. For these applications the phase based registration approach has proven to be superior. Today there […]

CUDA

Mar, 5

Phase Based Volume Registration Using CUDA

CUDA

Mar, 5

Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA

The Variable Preconditioned GVR (VPGCR) with mixed precision on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA) is numerically investigated. The convergence theorem of VPGCR is guaranteed that the residual equation for the preconditioned procedure can be solved in the range of single precision operation. The results of computations show that VPGCR with […]

CUDA

Mar, 5

Data Mining Using Graphics Processing Units

During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As […]

CUDA

Mar, 5

Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware

The discrete wavelet transform (DWT) has been extensively used for image compression and denoising in the areas of image processing and computer vision. However, the intensive computation of DWT due to its inherent multilevel data decomposition and reconstruction operations brings a bottleneck that drastically reduces its performance and implementations for real-time applications when facing large […]

OpenGL

Mar, 5

Parallel Computing: The Elephant in the Room

Over the past few years, there has been a shift towards multi-core processors, driven partially by physical limitations. Mistaken assumptions of how effective and useful parallel systems can be have also provided motivation for this change. In this paper, we seek to directly identify the barriers to parallel computation. The barriers are not, as conventional […]

Mar, 5

Inter-Block GPU Communication via Fast Barrier Synchronization

While GPGPU stands for general-purpose computation on graphics processing units, the lack of explicit support for inter-block communication on the GPU arguably hampers its broader adoption as a general-purpose computing device. Interblock communication on the GPU occurs via global memory and then requires barrier synchronization across the blocks, i.e., inter-block GPU communication via barrier synchronization. […]

CUDA

Mar, 5

Designing Efficient Many-Core Parallel Algorithms for All-Pairs Shortest-Paths Using CUDA

Finding the all-pairs shortest-paths on a large graph is a fundamental problem in many practical applications such as bioinformatics, internet node traffic and network routing. In this paper, we present the designs of two efficient parallel algorithms for many-core GPUs using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using […]

CUDA

Mar, 4

Formal Description and Optimization Based High – Performance Computing on CUDA

In recent years, with the development of GPU, based on the general purpose computation on graphics processors has became a new field. Aiming at the processing of GPU, this paper provides the formal description for data parallel mode, a detailed description of the CUDA programming mode land the principle of optimization. It shows by the […]

CUDA

Mar, 4

A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA

Nowadays, the compute capability of traditional cluster system can’t keep up with the computing needs of a practical application, and these aspects of energy, space technology, etc. have become a huge problem. However, as parallel computing equipment, the stream processor (SP) has a high performance of floating-point operations. NVIDIA GPUs is a typical stream processor […]

CUDA

Mar, 4

Scene Recognition Acceleration Using CUDA and OpenMP

Scene recognition has become a remarkable field in image processing area, and many methods have been proposed in recent years, in which the idea of extracting the scene gist from global features has been proved to have higher retrieval accuracy compared with many other methods. However, the process of extracting gist is heavily time-consuming and […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High-performance GPU based Rendering for Real-Time, rigid 2D/3D-Image Registration in Radiation Oncology

Phase Based Volume Registration on the GPU with Application to Quantitative MRI

Phase Based Volume Registration Using CUDA

Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA

Data Mining Using Graphics Processing Units

Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware

Parallel Computing: The Elephant in the Room

Inter-Block GPU Communication via Fast Barrier Synchronization

Designing Efficient Many-Core Parallel Algorithms for All-Pairs Shortest-Paths Using CUDA

Formal Description and Optimization Based High – Performance Computing on CUDA

A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA

Scene Recognition Acceleration Using CUDA and OpenMP

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)