high performance computing on graphics processing units: hgpu.org

Posts

Jan, 18

The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high performance computing (HPC) facilities-often with a limited customizable user environment-are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop […]

CUDA

Jan, 18

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

In this manuscript, the parallel computation is investigated including reviewing different programming APIs and architectures. Two specific parallel API-MPI and CUDA C are deeply analyzed. Two sorting algorithms and a visual mathematic problem are implemented with MPI alone with performance analysis. A stable fluid dynamics simulation has been experimented with CUDA. We also present a […]

CUDA

•

OpenGL

Jan, 18

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images

The context of this work is related to the vertebra segmentation. The method we propose is based on the active shape model (ASM). An original approach taking advantage of the edge polygonal approximation was developed to locate the vertebra positions in a X-ray image. Despite the fact that segmentation results show good efficiency, the time […]

CUDA

Jan, 18

LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure

In the last two decades, a lot of protein 3D shapes have been discovered, characterized and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. We present in this paper an approach entitled LNA (Laplacian Norm […]

OpenCL

Jan, 18

Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures

This thesis addresses the memory efficiency of general-purpose applications running on massively multi-threaded, data-parallel GPU architectures. Although scalable, data-parallel GPU architectures and their associated general-purpose programming models offer impressive computational capability and attractive power budgets, the pace of migrating general-purpose applications to this emerging class of architectures is significantly hindered by the efficiency of memory […]

OpenCL

Jan, 18

RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures

Acceleration of cryptographic applications on massive parallel computing platforms, such as Graphic Processing Units (GPUs), becomes a real challenge concerning practical implementations. In this paper, we propose a parallel algorithm for Elliptic Curve (EC) point multiplication in order to compute EC cryptography on these platforms. The proposed approach relies on the usage of the Residue […]

OpenCL

Jan, 17

CPU/GPU computing for long-wave radiation physics on large GPU clusters

Geoscience simulations rely heavily on high performance computing (HPC) systems. To date, many CPU/GPU heterogeneous HPC systems have been established on which many geoscience simulations have been performed. For most of these simulations on GPU clusters, it can be observed that only the GPU’s computational capacity has been exploited to accomplish the arithmetic operations while […]

CUDA

Jan, 17

An Adaptive Step Size GPU ODE Solver for Simulating the Electric Cardiac Activity

Simulation of electric cardiac activity requires the solution of a very large system of ordinary differential equations, which requires long computing times. Modern Graphic Processing Units (GPU) are powerful computing devices, which have been used to simulate electric cardiac activity. However, the numerical techniques applied were based on fixed time step. In this paper we […]

CUDA

Jan, 17

GPU-based implementation of a cerebellar spiking network model for realtime robot control

We implemented a large-scale cerebellar cortical model composed of more than 100,000 spiking neuron units on a Graphics Processing Unit (GPU). We carried out computer simulations of the model in realtime. We adopted the model to online learning of timing for a humanoid robot.

CUDA

Jan, 17

GPU Prefilter for Accurate Cubic B-spline Interpolation

Achieving accurate interpolation is an important requirement for many signal-processing applications. While nearest-neighbor and linear interpolation methods are popular due to their native GPU support, they unfortunately result in severe undesirable artifacts. Better interpolation methods are known but lack a native GPU support. Yet, a particularly attractive one is prefiltered cubic-spline interpolation. The signal it […]

CUDA

Jan, 17

Data registration module – a component of semantic simulation engine

In this paper the data registration module being a component of semantic simulation engine is shown. An improved implementation of ICP (Iterative Closest Point) algorithm based on GPGPU (General-purpose computing on graphics processing units) is proposed. The main achievement is on-line aliment of two data sets composed of up to 262144 3D points, therefore it […]

CUDA

Jan, 17

Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology

Current processor trends of integrating more cores with wider SIMD units, along with a deeper and complex memory hierarchy, have made it increasingly more challenging to extract performance from applications. It is believed by some that traditional approaches to programming do not apply to these modern processors and hence radical new languages must be discovered. […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

Heterogeneous Computing for Vertebra Detection and Segmentation in X-Ray Images

LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure

Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures

RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures

CPU/GPU computing for long-wave radiation physics on large GPU clusters

An Adaptive Step Size GPU ODE Solver for Simulating the Electric Cardiac Activity

GPU-based implementation of a cerebellar spiking network model for realtime robot control

GPU Prefilter for Accurate Cubic B-spline Interpolation

Data registration module – a component of semantic simulation engine

Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)