high performance computing on graphics processing units: hgpu.org

Posts

Apr, 7

GPU-Accelerated Face Detection Algorithm

This work is an overview of a preliminary experience in developing high-performance face detection accelerated by GPU co-processors. The objective is to illustrate the advantages and difficulties encountered while utilizing the GPU technology to perform face detection. Moreover the introduced implementation is a much faster than currently existing techniques. Previous techniques for speeding up face […]

OpenCL

Apr, 7

A Study on Efficient Application Mapping on Parallel Computing Accelerators

Since the invention of electronic computers, their performance has been constantly advanced. The recent progress of micro processors in performance has been mainly achieved by increasing the number of cores on a device, instead of increasing working frequency. In addition, because of increasing of density of semiconductors, not only computational performance but also density of […]

CUDA

Apr, 7

Parallel processing for SAR image generation in CUDA – GPGPU platform

High resolution imagery from synthetic aperture radar (SAR) video data requires numerical computations of the order of gigaflops (GFLOP). The computational burden increases with the image size and the amount of input raw video signals. General purpose graphic processor units (GPGPU) can play a pivotal role in parallel processing the raw video data to generate […]

CUDA

Apr, 7

Acceleration of a Full-scale Industrial CFD Application with OP2

Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applications, which feature high speed-ups when accelerated by […]

CUDA

Apr, 7

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Numerical methods in sparse linear algebra typically rely on a fast and efficient matrix vector product, as this usually is the backbone of iterative algorithms for solving eigenvalue problems or linear systems. Against the background of a large diversity in the characteristics of high performance computer architectures, it is a challenge to derive a cross-platform […]

CUDA

Apr, 6

Optimizing Krylov Subspace Solvers on Graphics Processing Units

Krylov subspace solvers are often the method of choice when solving sparse linear systems iteratively. At the same time, hardware accelerators such as graphics processing units (GPUs) continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a […]

CUDA

Apr, 6

A Low-Power Hybrid CPU-GPU Sort

This thesis analyses the energy efficiency of a low-power CPU-GPU hybrid architecture. We evaluate the NVIDIA Ion architecture, which couples an Intel Atom low power processor with an integrated GPU that has an order of magnitude fewer processors compared to traditional discrete GPUs. We attempt to create a system that balances computation and I/O capabilities […]

CUDA

Apr, 6

High Performance Computing for Large Graphs of Internet Applications using GPU

The high speed CPU based routers currently in use could not handle the massive data required for real-time multimedia communication. Graphics processing units (GPUs) offer an appreciable alternative due to high computation power which results from their parallel execution units. This paper presents the implementation of the Dijkstra’s link state IP routing algorithm using GPU. […]

CUDA

Apr, 6

GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

In this paper, we implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There are several improvement in the implementation of the algorithm as well. […]

CUDA

Apr, 6

Parallel Support Vector Machines in Practice

In this paper, we evaluate the performance of various parallel optimization methods for Kernel Support Vector Machines on multicore CPUs and GPUs. In particular, we provide the first comparison of algorithms with explicit and implicit parallelization. Most existing parallel implementations for multi-core or GPU architectures are based on explicit parallelization of Sequential Minimal Optimization (SMO) […]

CUDA

Apr, 4

CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported […]

CUDA

Apr, 4

An efficient GPU acceptance-rejection algorithm for the selection of the next reaction to occur for Stochastic Simulation Algorithms

MOTIVATION: The Stochastic Simulation Algorithm (SSA) has largely diffused in the field of systems biology. This approach needs many realizations for establishing statistical results on the system under study. It is very computationnally demanding, and with the advent of large models this burden is increasing. Hence parallel implementation of SSA are needed to address these […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU-Accelerated Face Detection Algorithm

A Study on Efficient Application Mapping on Parallel Computing Accelerators

Parallel processing for SAR image generation in CUDA – GPGPU platform

Acceleration of a Full-scale Industrial CFD Application with OP2

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Optimizing Krylov Subspace Solvers on Graphics Processing Units

A Low-Power Hybrid CPU-GPU Sort

High Performance Computing for Large Graphs of Internet Applications using GPU

GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

Parallel Support Vector Machines in Practice

CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

An efficient GPU acceptance-rejection algorithm for the selection of the next reaction to occur for Stochastic Simulation Algorithms

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)