high performance computing on graphics processing units: hgpu.org

Posts

Nov, 2

Adjoint Algorithmic Differentiation of a GPU Accelerated Application

We consider a GPU accelerated program using Monte Carlo simulation to price a basket call option on 10 FX rates driven by a 10 factor local volatility model. We develop an adjoint version of this program using algorithmic differentiation. The code uses mixed precision. For our test problem of 10,000 sample paths with 360 Euler […]

CUDA

Nov, 2

An MPI-CUDA Implementation for the Compression of DEM

A high performance terrain data compression method is proposed based on discrete wavelet transform (DWT) and parallel run-length code. But the implementation of the schemes to solve these models in realistic scenarios imposes huge demands of computing power. Compute Unified Device Architecture (CUDA) programmed, Graphic Processing Units (GPUs) are rapidly becoming a major choice in […]

CUDA

Nov, 2

Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

GPU parallelism for real applications can achieve enormous performance gain. CPU-GPU Communication is one of the major bottlenecks that limit this performance gain. Among several libraries developed so far to optimize this communication, DyManD (Dynamically Managed Data) provides better communication optimization strategies and achieves better performance on a single GPU. Smith-Waterman is a well known […]

CUDA

Oct, 30

GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method

We propose a numerical implementation based on a Graphics Processing Unit (GPU) for the acceleration of the execution time of the Lattice Boltzmann Method (LBM). The study focuses on the application of the LBM for patient-specific blood flow computations, and hence, to obtain higher accuracy, double precision computations are employed. The LBM specific operations are […]

CUDA

Oct, 30

GPU-Based Image Segmentation Using Level Set Method With Scaling Approach

In recent years, with the development of graphics processors, graphics cards have been widely used to perform general-purpose calculations. Especially with release of CUDA C programming languages in 2007, most of the researchers have been used CUDA C programming language for the processes which needs high performance computing. In this paper, a scaling approach for […]

CUDA

Oct, 30

An Evolutionary Approach to Parallel Computing Using GPU

A few years, the programmable graphics processor unit has evolved into an absolute High performance computing. Simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. A compiler and run time system that abstracts and virtualizes many aspects of graphics hardware. Commodity graphics hardware has rapidly evolved from being a fixed-function pipeline […]

CUDA

Oct, 30

The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support

Recent increases in supercomputing power, driven by the multi-core revolution and accelerators such as the IBM Cell processor, graphics processing units (GPUs) and Intel’s Many Integrated Core (MIC) technology have enabled kinetic simulations of plasmas at unprecedented resolutions, but changing HPC architectures also come with challenges for writing efficient numerical codes. This paper describes the […]

CUDA

Oct, 30

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

The heterogeneous computing platform with the tremendous raw capacity can be easily constructed with the availability of multi-core processors, high capacitive FPGAs and GPUs which can include any number of these computing units. However, challenge faced until now was the lack of a standardized framework under which the computational tasks and data of applications could […]

OpenCL

Oct, 29

Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation

The introduction of prior knowledge into image analysis algorithms is a central challenge in computer vision. In this paper, we introduce the concept of proximity priors into semantic segmentation methods in order to penalize the proximity of certain object classes. Proximity priors are a generalization of purely global and purely local co-occurrence priors which have […]

CUDA

Oct, 29

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

For detecting potential problems of a cutter path, cutting force simulation in the NC milling process is necessary prior to actual machining. A milling operation is geometrically equivalent to a Boolean subtraction of the swept volume of a cutter moving along a path from a solid model representing the stock shape. In order to precisely […]

CUDA

Oct, 29

GPU-Mapping: Robotic Map Building with Graphical Multiprocessors

This paper provides a wide perspective of the potential applicability of Graphical Processing Units (GPUs) computing power in robotics, specifically in the well known problem of 2D robotic mapping. There are three possible ways of exploiting these massively parallel devices: I) parallelizing existing algorithms, II) integrating already existing parallelized general purpose software, and III) making […]

CUDA

Oct, 29

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA’s Tesla Graphics Processing […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Adjoint Algorithmic Differentiation of a GPU Accelerated Application

An MPI-CUDA Implementation for the Compression of DEM

Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method

GPU-Based Image Segmentation Using Level Set Method With Scaling Approach

An Evolutionary Approach to Parallel Computing Using GPU

The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

GPU-Mapping: Robotic Map Building with Graphical Multiprocessors

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)