high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

ECM on Graphics Cards

This paper reports record-setting performance for the elliptic-curve method of integer factorization: for example, 926.11 curves/second for ECM stage 1 with B1=8192 for 280-bit integers on a single PC. The state-of-the-art GMP-ECM software handles 124.71 curves/second for ECM stage 1 with B1=8192 for 280-bit integers using all four cores of a 2.4 GHz Core 2 Quad […]

CUDA

Nov, 5

Realistic real-time sound re-synthesis and processing for interactive virtual worlds

We present new GPU-based techniques for implementing linear digital filters for real-time audio processing. Our solution for recursive filters is the first presented in the literature. We demonstrate the relevance of these algorithms to computer graphics by synthesizing realistic sounds of colliding objects made of different materials, such as glass, plastic, and wood, in real […]

CUDA

Nov, 5

Solving Path Problems on the GPU

We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over […]

CUDA

Nov, 5

Parallel search on video cards

Recent approaches exploiting the massively parallel architecture of graphics processors (GPUs) to accelerate database operations have achieved intriguing results. While parallel sorting received significant attention, parallel search has not been explored. With p-ary search we present a novel parallel search algorithm for large-scale database index operations that scales with the number of processors and outperforms […]

CUDA

Nov, 5

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, […]

CUDA

•

OpenCL

Nov, 5

Faster matrix-vector multiplication on GeForce 8800GTX

Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on NVIDIA CUDA architecture. The experimental results on GeForce 8800GTX show that the proposed algorithm runs maximum 15.69 (resp., 32.88) times faster than the sgemv routine […]

CUDA

Nov, 5

Acceleration of direct volume rendering with programmable graphics hardware

We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used […]

OpenGL

Nov, 5

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

This paper presents a graphics processor based implementation of the Finite Difference Time Domain (FDTD), which uses a central finite differencing scheme for solving Maxwell’s equations for electromagnetics. FDTD simulations can be very computationally expensive and require thousands of CPU hours to solve on traditional general purpose processors. Modern Graphics Processing Units (GPUs) found in […]

OpenGL

Nov, 5

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx […]

OpenGL

Nov, 5

Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware

GPU-based raycasting offers an interesting alternative to conventional slice-based volume rendering due to the inherent flexibility and the high quality of the generated images. Recent advances in graphics hardware allow for the ray traversal and volume sampling to be executed on a per-fragment level completely on the GPU leading to interactive framerates. In this work […]

OpenGL

Nov, 5

GPUTeraSort: high performance graphics co-processor sorting for large database management

We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces […]

Nov, 4

2D/3D image registration on the GPU

We present a method that performs a rigid 2D/3D image registration efficiently on the Graphical Processing Unit (GPU). As one main contribution of this paper, we propose an efficient method for generating realistic DRRs that are visually similar to x-ray images. Therefore, we model some of the electronic post-processes of current x-ray C-arm-systems. As another […]

OpenGL

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

ECM on Graphics Cards

Realistic real-time sound re-synthesis and processing for interactive virtual worlds

Solving Path Problems on the GPU

Parallel search on video cards

A Performance Comparison of CUDA and OpenCL

Faster matrix-vector multiplication on GeForce 8800GTX

Acceleration of direct volume rendering with programmable graphics hardware

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware

GPUTeraSort: high performance graphics co-processor sorting for large database management

2D/3D image registration on the GPU

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)