high performance computing on graphics processing units: hgpu.org

Posts

Nov, 30

GPGPU Accelerated Cardiac Arrhythmia Simulations

Computational modeling of cardiac electrophysiology is a powerful tool for studying arrhythmia mechanisms. In particular, cardiac models are useful for gaining insights into experimental studies, and in the foreseeable future they will be used by clinicians to improve therapy for the patients suffering from complex arrhythmias. Such models are highly intricate, both in their geometric […]

CUDA

Nov, 30

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such […]

CUDA

Nov, 30

GPU-Accelerated SPH Model for Water Waves and Other Free Surface Flows

This paper discusses the meshless numerical method Smoothed Particle Hydrodynamics and its application to water waves and nearshore circulation. In particularly we focus on an implementation of the model on the graphics processing unit (GPU) of computers, which permits low-cost supercomputing capabilities for certain types of computational problems. The implementation here runs on Nvidia graphics […]

CUDA

•

OpenGL

Nov, 29

Parallel preconditioning for spherical harmonics expansions of the Boltzmann transport equation

While the Monte Carlo method for the Boltzmann transport equation for semiconductors has already been parallelized, this is much more difficult to accomplish for the deterministic spherical harmonics expansion method which requires the solution of a linear system of equations. For the typically employed iterative solvers, preconditioners are required to obtain good convergence rates. These […]

OpenCL

Nov, 29

Multiresolution Flow Simulations on Multi/many-core Architectures

One of the key challenges in Computational Science is closing the gap between the available computer power and its effective utilization for the simulation of complex physical systems and engineering applications. In order to achieve this goal we must minimize the time-to-solution and the related energy requirements of simulations by developing scalable software and methods […]

CUDA

Nov, 29

Applications Performance on GPGPUs with the Fermi Architecture

The latest GPU architecture released by Nvidia, code-named "Fermi", is the most advanced computing GPU architecture ever built. Radical changes took place on the GPU computing architecture compared to Fermi’s predecessors such as the GT200 series and the G80s. In this dissertation the Fermi architecture is analysed, addressing the most prominent upgrades, by running extensive […]

CUDA

Nov, 29

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory […]

CUDA

Nov, 29

Directives Based Programming of GPU Accelerated Systems

Graphics Processing Units (GPUs) are commodity chips primarily used as coprocessors for processing high definition graphics on a computer system. It possess faster processing power and efficiency in handling accurate single and double floating point numbers with less power consumption compared to CPUs. Realising its potential in general purpose computing manufacturers of these chips have […]

CUDA

Nov, 29

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

Using profiling tools is a common way to understand computer systems and software and to achieve the best performance. Profiling becomes more important as computing technology advances and makes it more difficult to intuitively reason about system characteristics. However, the recent shift in computing technology to multicore systems and heterogeneous systems requires new profiling methods […]

CUDA

•

OpenCL

Nov, 29

GPGPU Volume Classification using SimpleOpenCL

In volume visualization, the definition of the regions of interest is inherently an iterative trialand-error process finding out the best parameters to classify and render the final image. In this work, we present a general framework for training multi-class classifiers using Error-Correcting Output Codes. Moreover, we propose a GPGPU parallelization system using SimpleOpenCL, an OpenSource […]

OpenCL

Nov, 29

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

We compare two approaches to compute a portion of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism […]

CUDA

Nov, 29

Electric polarizability of hadrons with overlap fermions on multi-GPUs

Electric polarizability is an important parameter for the internal structure of hadrons. Previous studies of polarizabilities have been done at relatively heavy pion masses, leaving the chiral region largely unexplored. In this report, we use overlap fermions which are known to be computationally demanding to properly capture the chiral dynamics. We present an implementation strategy […]

high performance computing on graphics processing units: hgpu.org

Posts

GPGPU Accelerated Cardiac Arrhythmia Simulations

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

GPU-Accelerated SPH Model for Water Waves and Other Free Surface Flows

Parallel preconditioning for spherical harmonics expansions of the Boltzmann transport equation

Multiresolution Flow Simulations on Multi/many-core Architectures

Applications Performance on GPGPUs with the Fermi Architecture

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

Directives Based Programming of GPU Accelerated Systems

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

GPGPU Volume Classification using SimpleOpenCL

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

Electric polarizability of hadrons with overlap fermions on multi-GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)