high performance computing on graphics processing units: hgpu.org

Posts

Dec, 15

Speed sign detection and recognition by convolutional neural networks

From the desire to update the maximum road speed data for navigation devices, a speed sign recognition and detection system is proposed. This system should prevent accidental speeding at roads where the map data is incorrect for example due to construction work. Multiple examples of road sign classification systems already exist but none uses a […]

CUDA

Dec, 15

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

Multiple results in the literature exist that indicate that all computed solutions to chaotic dynamical systems are time-step dependent. That is, solutions with small but different time steps will decouple from each other after a certain (small) finite amount of simulation time. When using double precision floating point arithmetic time step independent solutions have been […]

CUDA

Dec, 14

Graph Generation on GPUs using Dynamic Memory Allocation

Complex networks are often studied using statistical measurements over many independently generated samples. Irregular data structures such as graphs that involve dynamical memory management and "pointer chasing" are an important class of application and have attracted recent interest in the form of the Graph500 benchmark formulation. The generation of simulated sample network graphs and measurement […]

CUDA

Dec, 14

A Novel Multi-GPU Neural Simulator

Between the biophysical and behavioral studies of the brain lies computational neuroscience. The goal of which, among other things, is to help bridge the gap in our knowledge and provide alternative or complimentary theories to other neurological studies. As more information is provided and more complex theories are developed, the size and computational cost of […]

CUDA

Dec, 14

Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs

Heat removal is one of the major challenges faced in developing the new generation of high density integrated circuits. Future design technologies strongly depend on the availability of efficient means for thermal modeling and analysis. These thermal models must be also accurate and provide the most efficient level of abstraction enabling fast execution. We propose […]

CUDA

Dec, 14

Power consumption of mixed precision in the iterative solution of sparse linear systems

This paper presents a detailed analysis of a mixed precision iterative refinement solver applied to a linear system obtained from the 2D discretization of a fluid flow problem. The total execution time and energy need of different soft- and hardware implementations are measured and compared with those of a plain GMRES-based solver in double precision. […]

CUDA

Dec, 14

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

We propose a novel approach for runtime monitoring and verification on computers with a large number of computation cores. The goal of the approach is to minimize the impact of runtime verification on the performance of the application being monitored. We distinguish between two kinds of computational overhead: (i) overhead caused by instrumentation and/or logging, […]

CUDA

Dec, 14

Fast Neural Network Training on General Purpose Computers

Neural networks allow the implementation of complicated applications such as stock market predictions on low-end PCs. However, the training of neural networks can take many hours on a PC. In this paper we propose a technique for training complicated neural networks on a commodity GPU (available in a low-end PC) that completes 6 times faster […]

CUDA

Dec, 14

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

The goal of this paper is to implement an efficient matrix inversion of symmetric positive-definite matrices on heterogeneous GPU-based systems. The matrix inversion procedure can be split into three stages: computing the Cholesky factorization, inverting the Cholesky factor and calculating the product of the inverted Cholesky factor with its transpose to get the final inverted […]

CUDA

Dec, 14

Accelerating Live Graph-Cut-Based Object Tracking Using CUDA

Graph cuts have found many applications that address the problem of energy minimization, which occur frequently in computer vision and image processing. One of the most common applications is binary image segmentation, or silhouette extraction. Image segmentation is the process of applying a labeling to each pixel in an image to determine a list of […]

CUDA

Dec, 14

Voxelized Minkowski sum computation on the GPU with robust culling

We present a new approach for computing the voxelized Minkowski sum (excluding any enclosed voids) of two polyhedral objects using programmable Graphics Processing Units (GPUs). We first cull out surface primitives that will not contribute to the final boundary of the Minkowski sum, analyzing and adaptively bounding the rounding errors of the culling algorithm to […]

CUDA

Dec, 14

Water Surface Animation using Damped Wave Equation and CUDA Acceleration

The damped wave equation is used for simulating water waves. The differential equation is approximated by finite differences. Explicit integration produces water height fields in real time. The CUDA framework is used to perform parallel computations on the GPU. It is shown that the GPU provides considerable speedup in comparison to the CPU.

CUDA

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

Speed sign detection and recognition by convolutional neural networks

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

Graph Generation on GPUs using Dynamic Memory Allocation

A Novel Multi-GPU Neural Simulator

Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs

Power consumption of mixed precision in the iterative solution of sparse linear systems

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

Fast Neural Network Training on General Purpose Computers

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Accelerating Live Graph-Cut-Based Object Tracking Using CUDA

Voxelized Minkowski sum computation on the GPU with robust culling

Water Surface Animation using Damped Wave Equation and CUDA Acceleration

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)