high performance computing on graphics processing units: hgpu.org

Posts

May, 5

Non-separable 2D, 3D and 4D filtering with CUDA

We have presented solutions for fast non-separable floating point convolution in 2, 3 and 4 dimensions, using the CUDA programming language. We believe that these implementations will serve as a complement to the NPP library, which currently only supports 2D filters and images stored as integers. The shared memory implementation with loop unrolling is approximately […]

CUDA

May, 5

Accelerating Mixed-Abstraction SystemC Models on Multi-Core CPUs and GPUs

Functional verification is a critical part in the hardware design process cycle, and it contributes for nearly two-thirds of the overall development time. With increasing complexity of hardware designs and shrinking time-to-market constraints, the time and resources spent on functional verification has increased considerably. To mitigate the increasing cost of functional verification, research and academia […]

CUDA

May, 5

Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing

Remotely sensed hyperspectral imaging missions are often limited by onboard power restrictions while, simultaneously, require high computing power in order to address applications with relevant constraints in terms of processing times. In recent years, graphics processing units (GPUs) have emerged as a commodity computing platform suitable to meet real-time processing requirements in hyperspectral image processing. […]

CUDA

May, 5

GPU-based Parallel Computing for Nonlinear Finite Element Deformation Analysis

Computer-based surgical simulation and non-rigid medical image registration in image-guided interventions are examples of applications that would benefit from real-time deformation simulation of soft tissues. The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models which then can be discretized by the Finite Element Method (FEM) for a numerical solution. […]

CUDA

May, 3

Refresh Rate Modulation for Perceptually Optimized Computer Graphics

The application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of such techniques have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. We introduce Refresh Rate Modulation (RRM), a novel perceptual […]

OpenCL

•

OpenGL

May, 3

GPU-accelerated ray-tracing for real-time treatment planning

Dose calculation methods in radiotherapy treatment planning require the radiological depth information of the voxels that represent the patient volume to correct for tissue inhomogeneities. This information is acquired by time consuming ray-tracing-based calculations. For treatment planning scenarios with changing geometries and real-time constraints this is a severe bottleneck. We implemented an algorithm for the […]

CUDA

May, 3

Implementation of a PIC simulation using WebGL

This project’s aim is to find a WebGL based alternative to the Java implementation of OpenPixi, a Java-based Particle-in-Cell (PIC) simulation software, and to add a third dimension. For this purpose, an existing JavaScript library, three.js, was chosen. A handful of approaches are explored and the resulting prototypes are then compared in terms of speed, […]

OpenGL

May, 3

Coalition Structure Generation with the Graphics Processing Unit

Coalition Structure Generation-the problem of finding the optimal division of agents into coalitions-has received considerable attention in recent AI literature. The fastest exact algorithm to solve this problem is IDP-IP* [17], which is a hybrid of two previous algorithms, namely IDP and IP. Given this, it is desirable to speed up IDP as this will, […]

CUDA

May, 3

A Performance Optimization Support Framework for GPU-based Traffic Simulations with Negotiating Agents

To realize a simulation which can handle hundreds of thousands of negotiating agents keeping their detailed behaviors, massive amount of computational power is required. Also having good programmability of agents’ codes to realize complex behaviors is essential to realize it. On deploying such negotiating agents on an agent simulation, it is important to be able […]

OpenCL

May, 2

Real-time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera

In this paper we present a novel real-time algorithm for simultaneous pose and shape estimation for articulated objects, such as human beings and animals. The key of our pose estimation component is to embed the articulated deformation model with exponential-maps-based parametrization into a Gaussian Mixture Model. Benefiting from the probabilistic measurement model, our algorithm requires […]

CUDA

May, 2

3D FFT on a Single FPGA

The 3D FFT is critical in many physical simulations and image processing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multigrid. We find the opposite: a simple design, operating at a conservative frequency, takes 4ms for 16^3, 21ms for 32^3, and 215ms […]

CUDA

May, 2

Analysis of SuperLU Solvers on Intel MIC Architecture

Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful FPU that contains 512-bit SIMD registers. Intel Xeon Phi chip can benefit from the algorithms that operate with the large vectors. In this work, sequential, multithreaded and distributed versions of SuperLU solvers are tested on the […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Non-separable 2D, 3D and 4D filtering with CUDA

Accelerating Mixed-Abstraction SystemC Models on Multi-Core CPUs and GPUs

Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing

GPU-based Parallel Computing for Nonlinear Finite Element Deformation Analysis

Refresh Rate Modulation for Perceptually Optimized Computer Graphics

GPU-accelerated ray-tracing for real-time treatment planning

Implementation of a PIC simulation using WebGL

Coalition Structure Generation with the Graphics Processing Unit

A Performance Optimization Support Framework for GPU-based Traffic Simulations with Negotiating Agents

Real-time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera

3D FFT on a Single FPGA

Analysis of SuperLU Solvers on Intel MIC Architecture

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)