high performance computing on graphics processing units: hgpu.org

Posts

Dec, 4

Solving Rigid Multibody Physics Dynamics Using Proximal Point Functions on the GPU

Physical simulation is important for a wide range of problems, particularly so in the field of robotics. The need for faster simulation to provide larger amounts of data is increasingly growing. The trend in computing is growing towards more cores as opposed to faster cores, and the graphical processing unit, or GPU, shows great promise […]

CUDA

Dec, 3

Programming hybrid systems with implicit memory based synchronization

In the last years CPU performance increases came with an increase in software development complexity. One of the next big changes in CPU architecture may be so-called hybrid multicore chips, which combine both multicore and manycore technologies on the same chip. Unfortunately, this increase in performance again may lead to an increase in development complexity. […]

CUDA

Dec, 3

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. […]

CUDA

Dec, 3

A Multi-View Stereo Implementation on Massively Parallel Hardware

In recent years, we have seen several approaches to implement hardware-accelerated multi-view stereo (MVS) algorithms employing the graphics processing unit (GPU) for fast and parallel computation. To our knowledge, all of them resort to various rendering passes to perform their computations. In contrast, modern GPU compute frameworks give access to the massively parallel compute capability […]

CUDA

Dec, 3

Impact of Floating-Point Precision on Boundary Layer Instabilities Modeled on Fermi GPU

We have implemented two-dimensional and three-dimensional Rayleigh-Benard convection for infinite Prandtl number, appropriate for the Earth’s mantle, on a single Fermi GPU by utilizing a second-order finite-difference method. The code was written in C for CUDA and heavily itilized optimized CUBLAS routines. These implementations enjoyed performance on the order 535 GFLOP/s and 100 GFLOP/s in […]

CUDA

Dec, 3

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

Given a 2-D binary image of size nxn, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time […]

CUDA

Dec, 3

Design and Optimization of Hybrid MD5-Blowfish Encryption on GPUs

Nowadays, data has been playing an indispensable role in almost all industrial areas. Data integrity and security over Internet, other types of media and applications have become the major concerns in computer world. If confidential or sensitive data is forged, juggled or wiretapped by an attacker, capital losses might occur. Encryption is one of the […]

CUDA

Dec, 3

Mapping the Arnold web with a graphic processing unit

The Arnold diffusion constitutes a dynamical phenomenon which may occur in the phase space of a non-integrable Hamiltonian system whenever the number of the system degrees of freedom is M>=3. The diffusion is mediated by a web-like structure of resonance channels, which penetrates the phase space and allows the system to explore the whole energy […]

CUDA

Dec, 3

Ultrasound Image Simulation with GPU-based Ray Tracing

Medical simulators are gaining importance because the experience and skills necessary to perform many of the medical procedures are difficult to obtain due to patient safety and ethical issues. With the development of graphic cards, stereographic and haptic devices, more VR-based simulators are being created. We are developing an interactive ultrasound image simulation that include […]

Dec, 3

Comparison of CPML Implementations for the GPU-Accelerated FDTD Solver

Three distinctively different implementations of convolutional perfectly matched layer for the FDTD method on CUDA enabled graphics processing units are presented. All implementations store additional variables only inside the convolutional perfectly matched layers, and the computational speeds scale according to the thickness of these layers. The merits of the different approaches are discussed, and a […]

CUDA

Dec, 3

EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU

Computerized tomography (CT) plays a critical role in the practice of modern medicine. However, the radiation associated with CT is significant. Methods that can enable CT imaging at reduced radiation exposure without sacrificing image quality are therefore extremely important. This paper introduces a novel method for enabling improved reconstruction at lower radiation exposure levels. The […]

CUDA

Dec, 2

On the design of architecture-aware algorithms for emerging applications

This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Solving Rigid Multibody Physics Dynamics Using Proximal Point Functions on the GPU

Programming hybrid systems with implicit memory based synchronization

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

A Multi-View Stereo Implementation on Massively Parallel Hardware

Impact of Floating-Point Precision on Boundary Layer Instabilities Modeled on Fermi GPU

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

Design and Optimization of Hybrid MD5-Blowfish Encryption on GPUs

Mapping the Arnold web with a graphic processing unit

Ultrasound Image Simulation with GPU-based Ray Tracing

Comparison of CPML Implementations for the GPU-Accelerated FDTD Solver

EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU

On the design of architecture-aware algorithms for emerging applications

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)