high performance computing on graphics processing units: hgpu.org

Posts

Nov, 19

Evaluation of parallel particle swarm optimization algorithms within the CUDA architecture

Particle swarm optimization (PSO), like other population-based meta-heuristics, is intrinsically parallel and can be effectively implemented on Graphics Processing Units (GPUs), which are, in fact, massively parallel processing architectures. In this paper we discuss possible approaches to parallelizing PSO on graphics hardware within the Compute Unified Device Architecture (CUDA), a GPU programming environment by nVIDIA […]

CUDA

Nov, 19

Parallel Implementation on GPUs of ADI Finite Difference Methods for Parabolic PDEs with Applications in Finance

We study the parallel implementation on a Graphics Processing Unit (GPU) of Alternating Direction Implicit (ADI) time-discretization methods for solving time-dependent parabolic Partial Differential Equations (PDEs) in three spatial dimensions with mixed spatial derivatives in a variety of applications in computational finance. Finite differences on uniform grids are used for the spatial discretization of the […]

CUDA

Nov, 19

High-Performance Iterative Electron Tomography Reconstruction with Long-Object Compensation using Graphics Processing Units (GPUs)

Iterative reconstruction algorithms pose tremendous computational challenges for 3D Electron Tomography (ET). Similar to X-ray Computed Tomography (CT), graphics processing units (GPUs) offer an affordable platform to meet these demands. In this paper, we outline a CT reconstruction approach for ET that is optimized for the special demands and application setting of ET. It exploits […]

CUDA

Nov, 19

Rodinia: A benchmark suite for heterogeneous computing

This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley’s dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a […]

CUDA

Nov, 19

Ultra-fast FFT protein docking on graphics processors

MOTIVATION: Modelling proteinaprotein interactions (PPIs) is an increasingly important aspect of structural bioinformatics. However, predicting PPIs using in silico docking techniques is computationally very expensive. Developing very fast protein docking tools will be useful for studying large-scale PPI networks, and could contribute to the rational design of new drugs. RESULTS: The Hex spherical polar Fourier […]

CUDA

Nov, 19

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and […]

CUDA

Nov, 19

Design and Performance Evaluation of Image Processing Algorithms on GPUs

In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel GPU (graphics processing units) using the CUDA (compute unified device architecture) programming model. A set of metrics, customized for image processing, are proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of […]

CUDA

Nov, 19

Multi-dimensional characterization of temporal data mining on graphics processors

Through the algorithmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovation across a multitude of disciplines. For example, the exponential growth in data volume now presents an obstacle for high-throughput data mining in fields such as neuroscience and bioinformatics. As such, […]

CUDA

Nov, 19

A two-level real-time vision machine combining coarse- and fine-grained parallelism

In this paper, we describe a real-time vision machine having a stereo camera as input generating visual information on two different levels of abstraction. The system provides visual low-level and mid-level information in terms of dense stereo and optical flow, egomotion, indicating areas with independently moving objects as well as a condensed geometric description of […]

CUDA

Nov, 18

From Rendering to Tracking Point-based 3D Models

This paper adds to the abundant visual tracking literature with two main contributions. First we illustrate the interest of using Graphic Processing Units (GPU) to support efficient implementations of computer vision algorithms and, secondly, we introduce the use of point-based 3D models as a shape prior for real-time 3D tracking with a monocular camera. The […]

Nov, 18

Acceleration of the Smith-Waterman Algorithm using Single and Multiple Graphics Processors

Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of […]

CUDA

Nov, 18

Parallel implementation of Artificial Neural Network training for speech recognition

In this paper we describe the implementation of a complete ANN training procedure using the block mode back-propagation learning algorithm for sequential patterns – such as the observation feature vectors of a speech recognition system – exploiting the high performance SIMD architecture of GPU using CUDA and its C-like language interface. We also compare the […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Evaluation of parallel particle swarm optimization algorithms within the CUDA architecture

Parallel Implementation on GPUs of ADI Finite Difference Methods for Parabolic PDEs with Applications in Finance

High-Performance Iterative Electron Tomography Reconstruction with Long-Object Compensation using Graphics Processing Units (GPUs)

Rodinia: A benchmark suite for heterogeneous computing

Ultra-fast FFT protein docking on graphics processors

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

Design and Performance Evaluation of Image Processing Algorithms on GPUs

Multi-dimensional characterization of temporal data mining on graphics processors

A two-level real-time vision machine combining coarse- and fine-grained parallelism

From Rendering to Tracking Point-based 3D Models

Acceleration of the Smith-Waterman Algorithm using Single and Multiple Graphics Processors

Parallel implementation of Artificial Neural Network training for speech recognition

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)