high performance computing on graphics processing units: hgpu.org

Posts

Sep, 28

NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model

The broad adoption of accelerators boosts the interest in accelerator programming. Accelerators such as GPGPUs are optimized for throughput and offer high GFLOPS and memory bandwidth. CUDA has been adopted quite rapidly but it is proprietary and only applicable to GPUs, and the difficulty in writing efficient CUDA code has kindled the necessity to create […]

CUDA

Sep, 28

Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison

Phylogenetic inference is used to derive a "tree of life" for a collection of species whose DNA sequences are known. While several software packages have already been developed to take advantage of GPUs to accelerate phylogenetic inference, they typically require significant changes to the original code, constraining code maintenance. Recently, the OpenACC API was proposed […]

CUDA

Sep, 25

An open source finite-difference time-domain solver for room acoustics using graphics processing units

Wave based simulation methods have been utilized to numerically estimate wave propagation in domains where low-frequency wave effects dominate the response. Finite-difference time-domain (FDTD) methods are increasingly useful for such problems, but they require massive spatial oversampling to increase the bandwidth of the simulation, which leads to significant computational expense. The advantage of explicit time-stepping […]

CUDA

•

OpenGL

Sep, 25

Study on semi-global matching algorithm extended for multi baseline matching and parallel processing method based on GPU

This paper extended semi-global matching algorithm into multi baseline matching to improve matching reliability, especially studies kernel function optimization strategies and GPU threads’ executing scheme of matching cost cube computing and aggregating, and realized its fine granularity parallel processing based on GPU. The experiment results using three UCD aerial images based on Tesla C2050 GPU […]

CUDA

Sep, 25

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

GPU (Graphic processing system) enhance the performance of the performance of the computing field due to its hundreds of cores in parallel. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) programming models are included in GPU. The advantage of these two programming models in GPU is that developers don’t have to understand any […]

OpenCL

Sep, 25

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets because they have to (i) hold the whole dataset in memory and/or (ii) perform a very large number of kernel value computation. In this paper, we propose a scheme […]

CUDA

Sep, 25

Scalability Analysis of Parallel Algorithms on GPU Clusters

Scalability is an important concept in the domain of parallel computing. Since Graphics Processing Unit (GPU) clusters are and will be widely utilized in high performance computing platforms, we investigate the factors influencing the scalability for combinations of parallel algorithms (PA) and GPU clusters (GC).We present a scalability model for combination PA-GC and then validate […]

CUDA

Sep, 24

Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit

The vast majority of problems faced by bioinformatics are very complex and time consuming. They require the use of modern high-performance computational systems and the development of algorithms for such system. Heterogeneous computing systems which include graphics processing unit (GPU) occupy a separate niche. Such systems allow to accelerate solving of some task significantly. The […]

CUDA

Sep, 23

Advanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU

General-purpose computing on graphics processing units (GPGPU) is a massive fine-grain parallel computation platform, which is is particularly attractive for CFD tasks due to its potential of one or two magnitudes of performance improvement with relatively low capital investment. Many successful attempts have been reported in recent years (see, for example [1, 2, 3, 4, […]

CUDA

Sep, 23

Explicit Integration with GPU Acceleration for Large Kinetic Networks

We demonstrate the first implementation of recently-developed fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve […]

CUDA

Sep, 23

Computational Gravitational Dynamics with Modern Numerical Accelerators

We review the recent optimizations of gravitational N-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main N-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the […]

CUDA

Sep, 23

An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes

This work is related with the implementation of a finite volume method to solve the 2D Shallow Water Equations on Graphic Processing Units (GPU). The strategy is fully oriented to work efficiently with unstructured meshes which are widely used in many fields of Engineering. Due to the design of the GPU cards, structured meshes are […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model

Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison

An open source finite-difference time-domain solver for room acoustics using graphics processing units

Study on semi-global matching algorithm extended for multi baseline matching and parallel processing method based on GPU

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Scalability Analysis of Parallel Algorithms on GPU Clusters

Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit

Advanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU

Explicit Integration with GPU Acceleration for Large Kinetic Networks

Computational Gravitational Dynamics with Modern Numerical Accelerators

An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)