high performance computing on graphics processing units: hgpu.org

Posts

Nov, 13

Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards

The latest Graphics Processing Units (GPUs) are reported to reach up to 200 billion floating point operations per second (200 Gflops) and to have price performance of 0.1 cents per M flop. These facts raise great interest in the plausibility of extending the GPUs’ use to non-graphics applications, in particular numerical simulations on structured grids […]

Nov, 13

The Graphics Card as a Streaming Computer

Massive data sets have radically changed our understanding of how to design efficient algorithms; the streaming paradigm, whether it in terms of number of passes of an external memory algorithm, or the single pass and limited memory of a stream algorithm, appears to be the dominant method for coping with large data. A very different […]

Nov, 13

Cg in Two Pages

Cg is a language for programming GPUs. This paper describes Cg briefly.

Nov, 13

Solving Kinetic Equations on GPUs I: Model Kinetic Equations

We present an algorithm specifically tailored for solving kinetic equations onto GPUs. The efficiency of the algorithm is demonstrated by solving the one-dimensional shock wave structure problem and a two-dimensional low Mach number driven cavity flow. Computational results show that it is possible to cut down the computing time of the sequential codes of two […]

CUDA

Nov, 13

Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures

The implementation of a full electronic structure calculation code on a hybrid parallel architecture with Graphic Processing Units (GPU) is presented. The code which is on the basis of our implementation is a GNU-GPL code based on Daubechies wavelets. It shows very good performances, systematic convergence properties and an excellent efficiency on parallel computers. Our […]

CUDA

Nov, 13

Accelerator-Oriented Algorithm Transformation for Temporal Data Mining

Temporal data mining algorithms are becoming increasingly important in many application domains including computational neuroscience, especially the analysis of spike train data. While application scientists have been able to readily gather multi-neuronal datasets, analysis capabilities have lagged behind, due to both lack of powerful algorithms and inaccessibility to powerful hardware platforms. The advent of GPU […]

CUDA

Nov, 13

Application of Graphics Processing Units to Search Pipeline for Gravitational Waves from Coalescing Binaries of Compact Objects

We report a novel application of graphics processing units (GPUs) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16 fold has been achieved compared with a single central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in […]

CUDA

Nov, 13

The Living Application: a Self-Organising System for Complex Grid Tasks

We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving […]

Nov, 13

Efficient magnetohydrodynamic simulations on graphics processing units with CUDA

Magnetohydrodynamic (MHD) simulations based on the ideal MHD equations have become a powerful tool for modeling phenomena in a wide range of applications including laboratory, astrophysical, and space plasmas. In general, high-resolution methods for solving the ideal MHD equations are computationally expensive and Beowulf clusters or even supercomputers are often used to run the codes […]

CUDA

Nov, 13

GPU-based ultra fast dose calculation using a finite pencil beam model

Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required […]

CUDA

Nov, 13

Supercomputing and stellar dynamics

In this paper I will outline some of the aspects and problems of modern celestial mechanics and stellar dynamics, in the context of the quickly growing computing facilities. I will point the attention on the great advantages in using, for astrophysical simulations, the modern, fast and cheap Graphic Processing Units (GPUs) acting as true supercomputers. […]

Nov, 13

Implementation and evaluation of various demons deformable image registration algorithms on GPU

Online adaptive radiation therapy (ART) promises the ability to deliver an optimal treatment in response to daily patient anatomic variation. A major technical barrier for the clinical implementation of online ART is the requirement of rapid image segmentation. Deformable image registration (DIR) has been used as an automated segmentation method to transfer tumor/organ contours from […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards

The Graphics Card as a Streaming Computer

Cg in Two Pages

Solving Kinetic Equations on GPUs I: Model Kinetic Equations

Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures

Accelerator-Oriented Algorithm Transformation for Temporal Data Mining

Application of Graphics Processing Units to Search Pipeline for Gravitational Waves from Coalescing Binaries of Compact Objects

The Living Application: a Self-Organising System for Complex Grid Tasks

Efficient magnetohydrodynamic simulations on graphics processing units with CUDA

GPU-based ultra fast dose calculation using a finite pencil beam model

Supercomputing and stellar dynamics

Implementation and evaluation of various demons deformable image registration algorithms on GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)