high performance computing on graphics processing units: hgpu.org

Posts

May, 19

Comparison and Analysis of GPGPU and Parallel Computing on Multi-Core CPU

There are two ways to improve the performance of the algorithm computing, which are general purpose of computation and parallel computation of multi-core CPU. By comparison and analysis, contrast the main difference between them, we reach a conclusion that GPU is suitable for processing large-scale data-parallel load of high-density computing but relatively simple branching logic, […]

CUDA

May, 19

Applying Object Oriented Design Patterns to CUDA based Pyramidal Image Blending – An Experience

In this paper, we present Compute Unified Device Architecture i.e. CUDA based pyramidal image blending algorithm using an object oriented design patterns. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We introduce an object oriented framework […]

CUDA

•

OpenGL

May, 19

The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method

Implementation of a direct solver for the symmetric positive definite sparse matrix of general structure exploiting the parallelism on the graphic card (GPU). Implementation of a direct solver using the Schur complement specially for the requirements of sparse system in bundle adjustment.

CUDA

May, 19

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our […]

CUDA

May, 19

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

Hyperspectral imaging is a growing area in remote sensing in which an imaging spectrometer collects hundreds of images (at different wavelength channels) for the same area on the surface of the Earth. Hyperspectral images are extremely high-dimensional, and require advanced on-board processing algorithms able to satisfy near real-time constraints in applications such as wildland fire […]

CUDA

May, 16

An Introduction to the OpenCL Programming Model

This paper presents an overview of the OpenCL 1.1 standard [Khronos 2012]. We first motivate the need for GPGPU computing and then discuss the various concepts and technological background necessary to understand the programming model. We use concurrent matrix multiplication as a framework for explaining various performance characteristics of compiling and running OpenCL code, and […]

OpenCL

May, 16

GPU accelerated Nonlinear Soft Tissue Deformation

There are two types of structures in human body, solid organs and hollow membrane like organs. Brain, liver and other soft tissues such as tendons, muscles, cartilage etc., are examples of solid organs. Colon and blood vessels are examples of hollow organs. They greatly differ in structure and mechanical behavior. Deformation of these types of […]

CUDA

May, 16

Accelerated Network Coding with Dynamic Stream Decomposition on Graphics Processing Unit

Network coding, a well-known technique for optimizing data-flow in wired and wireless network systems, has attracted considerable attention in various fields. However, the decoding complexity in network coding becomes a major performance bottleneck in the practical network systems; thus, several researches have been conducted for improving the decoding performance in network coding. Nevertheless, previously proposed […]

CUDA

May, 16

b-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions

In this paper, we study several critical issues which must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the context of search. 1. (b-bit) Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations […]

CUDA

May, 16

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional graphics processing units (GPUs) onto a single motherboard. These APU processors provide multiple symmetric cores with their memory hierarchies […]

OpenCL

May, 15

The BiConjugate gradient method on GPUs

In a wide variety of applications from different scientific and engineering fields, the solution of complex and/or nonsymmetric linear systems of equations is required. To solve this kind of linear systems the BiConjugate Gradient method (BCG) is especially relevant. Nevertheless, BCG has a enormous computational cost. GPU computing is useful for accelerating this kind of […]

CUDA

May, 15

Batch Records Insertion into Multidimensional Linear Dynamic Hashing Table on GPU

Many parallel indexing solutions of multidimensional data have been proposed on graphics processing units (GPU) platform, whereas none of them has considered the dynamic update of data. A new solution of inserting batch records into multidimensional linear dynamic hashing (MLDH) table has been presented in this paper, which has implemented lock-free batch insertion and update […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comparison and Analysis of GPGPU and Parallel Computing on Multi-Core CPU

Applying Object Oriented Design Patterns to CUDA based Pyramidal Image Blending – An Experience

The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

An Introduction to the OpenCL Programming Model

GPU accelerated Nonlinear Soft Tissue Deformation

Accelerated Network Coding with Dynamic Stream Decomposition on Graphics Processing Unit

b-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

The BiConjugate gradient method on GPUs

Batch Records Insertion into Multidimensional Linear Dynamic Hashing Table on GPU

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)