high performance computing on graphics processing units: hgpu.org

Posts

Nov, 6

A Practical Quicksort Algorithm for Graphics Processors

In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can […]

CUDA

Nov, 6

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This […]

CUDA

Nov, 6

Programming model for a heterogeneous x86 platform

The client computing platform is moving towards a heterogeneous architecture consisting of a combination of cores focused on scalar performance, and a set of throughput-oriented cores. The throughput oriented cores (e.g. a GPU) may be connected over both coherent and non-coherent interconnects, and have different ISAs. This paper describes a programming model for such heterogeneous […]

Nov, 6

Maintaining constant frame rates in 3D texture-based volume rendering

3D texture-based volume rendering is a popular way of realizing direct volume visualization on graphics hardware. However, the slice-oriented texture memory layout of many current CPUs may lead to a strongly view-dependent performance, which reduces the fields of application of volume rendering. In this short technical note, we propose a slight modification of texture-based volume […]

OpenGL

Nov, 6

Accelerating Dust Temperature Calculations with Graphics Processing Units

When calculating the infrared spectral energy distributions (SEDs) of galaxies in radiation-transfer models, the calculation of dust grain temperatures is generally the most time-consuming part of the calculation. Because of its highly parallel nature, this calculation is perfectly suited for massively parallel general-purpose Graphics Processing Units (GPUs). This paper presents an implementation of the calculation […]

CUDA

Nov, 6

Fast scale invariant feature detection and matching on programmable graphics hardware

Ever since the introduction of freely programmable hardware components into modern graphics hardware, graphics processing units (GPUs) have become increasingly popular for general purpose computations. Especially when applied to computer vision algorithms where a Single set of Instructions has to be executed on Multiple Data (SIMD), GPU-based algorithms can provide a major increase in processing […]

CUDA

Nov, 6

Real-time visualization of large volume datasets on standard PC hardware

In medical area, interactive three-dimensional volume visualization of large volume datasets is a challenging task. One of the major challenges in graphics processing unit (GPU)-based volume rendering algorithms is the limited size of texture memory imposed by current GPU architecture. We attempt to overcome this limitation by rendering only visible parts of large CT datasets. […]

Nov, 6

Efficient simulation of large-scale spiking neural networks using CUDA graphics processors

Neural network simulators that take into account the spiking behavior of neurons are useful for studying brain mechanisms and for engineering applications. Spiking Neural Network (SNN) simulators have been traditionally simulated on large-scale clusters, super-computers, or on dedicated hardware architectures. Alternatively, Graphics Processing Units (GP Us) can provide a low-cost, programmable, and high-performance computing platform […]

CUDA

Nov, 6

Parallel view-dependent refinement of progressive meshes

We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices, to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, […]

Nov, 6

Magnetohydrodynamics simulations on graphics processing units

Magnetohydrodynamics (MHD) simulations based on the ideal MHD equations have become a powerful tool for modeling phenomena in a wide range of applications including laboratory, astrophysical, and space plasmas. In general, high-resolution methods for solving the ideal MHD equations are computationally expensive and Beowulf clusters or even supercomputers are often used to run the codes […]

CUDA

Nov, 6

SAPPORO: A way to turn your graphics cards into a GRAPE-6

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally […]

CUDA

Nov, 6

Graphics Hardware-Based Level-Set Method for Interactive Segmentation and Visualization

This paper presents an efficient graphics hardware-based method to segment and visualize level-set surfaces as interactive rates. Our method is composed of memory manager, level-set solver, and volume renderer. The memory manager which performs in CPU generates page table, inverse page table and available page stack as well as process the activation and inactivation of […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Practical Quicksort Algorithm for Graphics Processors

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

Programming model for a heterogeneous x86 platform

Maintaining constant frame rates in 3D texture-based volume rendering

Accelerating Dust Temperature Calculations with Graphics Processing Units

Fast scale invariant feature detection and matching on programmable graphics hardware

Real-time visualization of large volume datasets on standard PC hardware

Efficient simulation of large-scale spiking neural networks using CUDA graphics processors

Parallel view-dependent refinement of progressive meshes

Magnetohydrodynamics simulations on graphics processing units

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Graphics Hardware-Based Level-Set Method for Interactive Segmentation and Visualization

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)