high performance computing on graphics processing units: hgpu.org

Posts

Aug, 5

Modeling of Heat Diffusion Through Isotropic Media Using Graphical Processing Units

With accurate simulation of ever-more complex phenomena desired, numerical models are getting increasingly sophisticated and thus take considerable amount of time to run on normal computers. Parallel computing has emerged as an important paradigm in response, allowing engineers to run programs faster. In recent years graphics processing units (GPU) are among the massively parallel devices […]

CUDA

Aug, 5

Parallel perfusion imaging processing using GPGPU

BACKGROUND and PURPOSE: The objective of brain perfusion quantification is to generate parametric maps of relevant hemodynamic quantities such as cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) that can be used in diagnosis of acute stroke. These calculations involve deconvolution operations that can be very computationally expensive when using […]

CUDA

Aug, 4

Parallel GPU Implementation of Iterated Local Search for the Travelling Salesman Problem

The purpose of this paper is to propose effective parallelization strategies for the Iterated Local Search (ILS) metaheuristic on Graphics Processing Units (GPU). We consider the decomposition of the 3-opt Local Search procedure on the GPU processing hardware and memory structure. Two resulting algorithms are evaluated and compared on both speedup and solution quality on […]

CUDA

Aug, 4

Analysis and performance estimation of the conjugate gradient method on multiple GPUs

The Conjugate Gradient (CG) method is a widely-used iterative method for solving linear systems described by a (sparse) matrix. The method requires a large amount of Sparse-Matrix Vector (SpMV) multiplications, vector reductions and other vector operations to be performed. We present a number of mappings for the SpMV operation on modern programmable GPUs using the […]

CUDA

Aug, 4

Parallelization of KMP String Matching Algorithm on Different SIMD architectures: Multi-Core and GPGPU’s

String matching is a classical problem in computer science. After the study of the Naive string search, Brute Force and the KMP algorithm, several advantages and disadvantages of the algorithms have been analyzed. Considering KMP in particular concept of parallelization has been introduced to improve the performance of the KMP algorithm. The algorithm is designed […]

OpenCL

Aug, 4

Parallelization Design of Irregular Algorithms of Video Processing on GPUs

In this paper, we present the parallelization design consideration for irregular algorithms of video processing on GPUs. Enrich parallelism can be exploited by scheduling the processing order or making a tradeoff between performance and parallelism for irregular algorithms (such as CAVLC and deblocking filter). We implement a component-oriented CAVLC encoder and a direction-oriented deblocking filter […]

CUDA

Aug, 4

Porting marine ecosystem model spin-up using transport matrices to GPUs

We have ported an implementation of the spin-up for marine ecosystem models based on the "Transport Matrix Method" to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the PETSc library that is based on the "Message Passing Interface (MPI)" standard. The spin-up computes a steady seasonal cycle of the […]

CUDA

Aug, 3

Solving very large instances of the scheduling of independent tasks problem on the GPU

In this paper, we present two new parallel algorithms to solve large instances of the scheduling of independent tasks problem. First, we describe a parallel version of the Min-min heuristic. Second, we present GraphCell, an advanced parallel cellular genetic algorithm (CGA) for the GPU. Two new generic recombination operators that take advantage of the massive […]

CUDA

Aug, 3

Power Management for GPU-CPU Heterogeneous Systems

In recent years, GPU-CPU heterogeneous architectures have been increasingly adopted in high performance computing, because of their capabilities of providing high computational throughput. However, current research focuses mainly on the performance aspects of GPU-CPU architectures, while improving the energy efficiency of such systems receives much less attention. There are few existing efforts that try to […]

CUDA

Aug, 3

GPU-to-GPU and Host-to-Host multipattern string matching on a GPU

We develop GPU adaptations of the Aho-Corasick and multipattern Boyer-Moore string matching algorithms for the two cases GPU-to-GPU (input is initially in GPU memory and the output is left in GPU memory) and host-to-host (input and output are in the memory of the host CPU). For the GPU-to-GPU case, we consider several refinements to a […]

CUDA

Aug, 3

Parallel Statistical Analysis of Analog Circuits by GPU-accelerated Graph-based Approach

In this paper, we propose a new parallel statistical analysis method for large analog circuits using determinant decision diagram (DDD) based graph technique based on GPU platforms. DDD-based symbolic analysis technique enables exact symbolic analysis of vary large analog circuits. But we show that DDD-based graph analysis is very amenable for massively threaded based parallel […]

CUDA

Aug, 3

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance […]

high performance computing on graphics processing units: hgpu.org

Posts

Modeling of Heat Diffusion Through Isotropic Media Using Graphical Processing Units

Parallel perfusion imaging processing using GPGPU

Parallel GPU Implementation of Iterated Local Search for the Travelling Salesman Problem

Analysis and performance estimation of the conjugate gradient method on multiple GPUs

Parallelization of KMP String Matching Algorithm on Different SIMD architectures: Multi-Core and GPGPU’s

Parallelization Design of Irregular Algorithms of Video Processing on GPUs

Porting marine ecosystem model spin-up using transport matrices to GPUs

Solving very large instances of the scheduling of independent tasks problem on the GPU

Power Management for GPU-CPU Heterogeneous Systems

GPU-to-GPU and Host-to-Host multipattern string matching on a GPU

Parallel Statistical Analysis of Analog Circuits by GPU-accelerated Graph-based Approach

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)