high performance computing on graphics processing units: hgpu.org

Posts

Nov, 14

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric scalar data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, usually after a 3D segmentation of the structures of interest have been performed. Implementations of MC on CPUs are slow, using several seconds (even minutes) […]

OpenCL

Nov, 11

A parallel method for tuning Fuzzy TSK Systems with CUDA

This paper studies an option for offloading some types of AI processing to the Graphics Processing Unit (GPU), by proposing the parallelization of the Batch Least Squares (BLS) method for tuning consequent parameters and the gradient method for tuning input fuzzy sets in a Takagi-Sugeno-Kang Fuzzy Inference System using the Compute Unified Device Architecture (CUDA). […]

CUDA

Nov, 11

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

The implementation of the Split-Field Finite Difference Time-Domain (SP-FDTD) method in Graphics Pro- cessing Units is described in this work. This formalism is applied to light wave propagation through periodic media with arbitrary anisotropy. The anisotropic media is modeled by means of a permittivity tensor with non-diagonal elements and absorbing boundary conditions are also considered. […]

CUDA

Nov, 11

Using Graphic Processor Units for the Study of Electric Propagation in Realistic Heart Models

The multi-scale nature of the electrophysiology problem requires the use of fine temporal and spatial resolutions leading to models with millions of degrees of freedom that need to be solved for a thousand time steps. Solution of this problem requires the use of algorithms with higher level of parallelism in multi-core platforms. The newer programmable […]

CUDA

Nov, 11

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Nowadays, a heat and mass transfer simulation plays an important role in various engineering and industrial fields. To analyze physical behaviors of a thermal environment, we have to simulate heat and mass transfer phenomena. However to obtain numerical solutions to heat and mass transfer equations is much time-consuming. In this paper, therefore, one of acceleration […]

CUDA

Nov, 11

Fast Gpu-Based Interpolation for SAR Backprojection

We introduce and discuss a parallel SAR backprojection algorithm using a Non-Uniform FFT (NUFFT) routine implemented on a GPU in CUDA language. The details of a convenient GPU implementation of the NUFFT-based SAR backprojection algorithm, amenable to further generalizations to a multi-GPU architecture, are also given. The performance of the approach is analyzed in terms […]

CUDA

Nov, 10

GPU Acceleration of Pyrosequencing Noise Removal

Amplicon Noise [1], an updated version of Pyronoise [2], is a tool for removing noise from metagenomic data recorded by a 454 pyrosequencer. Amplicon Noise has shown to be effective in reducing overestimation of operational taxonomic units (OTUs) and chimera detection. Amplicon-Noise’s noise removal method relies on clustering a large set of short sequences read […]

CUDA

Nov, 10

Sigma*: Symbolic Learning of Input-Output Specifications

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation […]

CUDA

Nov, 10

Efficient Dynamic Derived Field Generation on Many-Core Architectures Using Python

Derived field generation is a critical aspect of many visualization and analysis systems. This capability is frequently implemented by providing users with a language to create new fields and then translating their "programs" into a pipeline of filters that are combined in sequential fashion. Although this design is highly extensible and practical for development, the […]

OpenCL

Nov, 10

Parallel execution of a parameter sweep for molecular dynamics simulations in a hybrid GPU/CPU environment

Molecular Dynamics (MD) simulations can help to utimingnderstand an immense number of phenomena at the nano and microscale. They often require the exploration of large parameter space, and a possible parallelization strategy consists of sending different parameter sets to different processors. Here we present such approach using a hybrid environment of Graphic Processing Units (GPUs) […]

CUDA

Nov, 10

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Quantum dots, that is, strongly confined electrons, show a variety of interesting properties. Of relevance in both experiments and various technical components, is the possibility to fine tune their electrical and optical properties. Quantum dots can be manufactured by a number of different techniques in practice, but we have in this thesis employed computer simulations […]

OpenCL

Nov, 8

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Hundreds of cores per chip and support for fine-grain multithreading have made GPUs a central player in todays HPC world. Much of the responsibility of achieving high performance on these complex systems lies with software like the compiler. This paper describes a compiler-based strategy for automatic and profitable application of the unroll-and-jam transformation to CUDA […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

A parallel method for tuning Fuzzy TSK Systems with CUDA

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

Using Graphic Processor Units for the Study of Electric Propagation in Realistic Heart Models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Fast Gpu-Based Interpolation for SAR Backprojection

GPU Acceleration of Pyrosequencing Noise Removal

Sigma*: Symbolic Learning of Input-Output Specifications

Efficient Dynamic Derived Field Generation on Many-Core Architectures Using Python

Parallel execution of a parameter sweep for molecular dynamics simulations in a hybrid GPU/CPU environment

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)