high performance computing on graphics processing units: hgpu.org

Posts

Jan, 7

Interactive Refactoring for GPU Parallelization of Affine Loops

Considerable recent attention has been given to the problem of porting existing code to heterogeneous computing architectures, such as GPUs. In this paper, we describe a novel, interactive refactoring tool that allows for quick and easy transformation of affine loops to execute on GPUs. Compared to previous approaches, our refactoring approach interactively combines the user’s […]

CUDA

Jan, 6

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

Analysis of large pathology image datasets offers significant opportunities for biomedical researchers to investigate the morphology of disease, but the resource requirements of image analyses limit the scale of those studies. Motivated by such a study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high […]

CUDA

Jan, 5

Approximate Subdivision Surface Evaluation in the Language of Linear Algebra

We present an interpretation of approximate subdivision surface evaluation in the language of linear algebra. Specifically, vertices in the refined mesh can be computed by left-multiplying the vector of control vertices by a sparse matrix we call the subdivision operator. This interpretation is rather general: it applies to any level of subdivision, it holds for […]

CUDA

Jan, 4

Hybrid GATE: A GPU/CPU implementation for imaging and therapy applications

Monte Carlo simulations (MCS) play a key role in medical applications. In this context GATE is a MCS platform dedicated to medical imaging and particle therapy. Yet MCS are very computationally demanding, which limits their applicability in clinical practice. Recently, graphics processing units (GPU) became, in many domains, a cost-effective solution to access high power […]

CUDA

Jan, 4

Graphic-Processing-Units Based Adaptive Parameter Estimation of a Visual Psychophysical Model

The applicability and effectiveness of adaptive design optimization (ADO) in selecting optimal stimuli or designs for experimental trials has been well demonstrated in several content areas of cognitive psychology (Myung & Pitt, 2009; Cavagnaro et al, 2010). On the other hand, when applying ADO to real-time, online experiments such as psychophysical experiments with human subjects, […]

CUDA

Jan, 4

Parallel one-versus-rest SVM training on the GPU

Linear SVMs are a popular choice of binary classifier. It is often necessary to train many different classifiers on a multiclass dataset in a one-versus-rest fashion, and this for several values of the regularization constant. We propose to harness GPU parallelism by training as many classifiers as possible at the same time. We optimize the […]

CUDA

Jan, 4

Fast Global Illumination for Interactive Volume Visualization

High quality global illumination can enhance the visual perception of depth cue and local thickness of volumetric data but it is seldom used in scientific visualization because of its high computational cost. This paper presents a novel grid-based illumination technique which is specially designed and optimized for volume visualization purpose. It supports common light sources […]

CUDA

Jan, 4

Long Timestep Molecular Dynamics on the Graphical Processing Unit

Molecular dynamics (MD) simulations now play a key role in many areas of theoretical chemistry, biology, physics, and materials science. In many cases, such calculations are significantly limited by the massive amount of computer time needed to perform calculations of interest. Herein, we present Long Timestep Molecular Dynamics (LTMD), a method to significantly speed MD […]

CUDA

•

OpenCL

Jan, 4

Automatic Code Generation for Stencil Computations on GPU Architectures

The development of parallel architectures is now nearly ubiquitous in not only the high-performance computing field, but also the commodity electronics market. Even embedded processors found in cell phones and tablet computers are starting to incorporate parallel architectures. These architectures are exploiting both SIMD (Single-Instruction Multiple-Data) and SIMT (Simple- Instruction Multiple-Thread) parallelism to achieve higher […]

CUDA

•

OpenCL

Jan, 3

Fast Poisson Solvers for Graphics Processing Units

Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX […]

OpenCL

Jan, 3

uBench: Performance Impact of CUDA Block Geometry

Nowadays, there is a lack of performance models for the execution of programs implemented using the CUDA model for GPU (Graphics Processing Units) devices. We have designed and implemented a suite of micro-benchmarks, called uBench. The purpose of uBench is to identify the effects on performance derived from the combination of: (1) the hardware details […]

CUDA

Jan, 3

High performance bioinformatics and computational biology on general-purpose graphics processing units

Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Interactive Refactoring for GPU Parallelization of Affine Loops

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

Approximate Subdivision Surface Evaluation in the Language of Linear Algebra

Hybrid GATE: A GPU/CPU implementation for imaging and therapy applications

Graphic-Processing-Units Based Adaptive Parameter Estimation of a Visual Psychophysical Model

Parallel one-versus-rest SVM training on the GPU

Fast Global Illumination for Interactive Volume Visualization

Long Timestep Molecular Dynamics on the Graphical Processing Unit

Automatic Code Generation for Stencil Computations on GPU Architectures

Fast Poisson Solvers for Graphics Processing Units

uBench: Performance Impact of CUDA Block Geometry

High performance bioinformatics and computational biology on general-purpose graphics processing units

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)