high performance computing on graphics processing units: hgpu.org

Posts

Aug, 27

Algorithms for Solving Non-Stationary Heat Conduction Problem for Design of a Technical Device

A model of a multilayer device with non-trivial geometrical and material structure and its working process is suggested. The thermal behavior of the device as one principle characteristic is simulated. The algorithm for solving the non-stationary heat conduction problem with a time-dependent periodical heating source is suggested. The algorithm is based on finite difference explicit–implicit […]

OpenCL

Aug, 26

HSApriori: High Speed Association Rule Mining using Apriori Based Algorithm for GPU

Apriori-Based algorithms are widely used for association rule mining. However, these algorithms cannot exploit the parallel processing power of modern GPU (Graphics Processing Unit). To make an algorithm to be compatible with GPU, it needs to be changed in representation of data, parallel processing and also in support count. In this paper we propose an […]

OpenCL

Aug, 26

Bandwidth Requirements of GPU Architectures

A new trend in chip multiprocessor (CMP) design is to incorporate graphics processing unit (GPU) cores, making them heterogeneous. GPU cores have a higher bandwidth requirement than CPU cores, as they tend to generate much more memory requests. In order to achieve good performance, there must be sufficient bandwidth between the GPU shader cores and […]

CUDA

Aug, 26

An Investigation of Unified Memory Access Performance in CUDA

Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications […]

CUDA

Aug, 26

Acceleration of Various Direct/Iterative Solvers for MoM by GPU and Its Computational Cost

Various guidelines for acceleration of MoM by GPU computing are summarized. Acceleration of direct/iterative solver for MoM by using GPU is realized. Quantitative study of computing time shows the performance of each guideline.

CUDA

Aug, 26

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Parallelcomputing is one of significant components of the High Performance Computing (HPC) and is being used to solve problems, which are large and complex in nature. Fuzzy Logic System (FLS) is a problem that becomes computationally intensive with increase in number of inputs and/or fuzzy rules. Running an FLS is highly parallel in nature, therefore, […]

CUDA

Aug, 23

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices.We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF […]

CUDA

Aug, 23

GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor

Graphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce administration costs and improve system utilization. However, virtualizing the GPU to support general purpose computing presents many challenges because of the complexity of […]

CUDA

Aug, 23

Illustrative Rendering of Particle Systems

Sets of particles are a frequently used tool for the exploration of time-varying flow fields due to their ease of use and conceptual simplicity. Understanding temporal changes in such particle systems can be difficult with traditional visualization methods such as isosurface rendering and particle splatting. These types of methods only show the current shape of […]

OpenCL

Aug, 23

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Heterogeneous processing using GPUs is here to stay and today spans mobile devices, laptops, and supercomputers. Although modern software development frameworks like OpenCL and CUDA serve as a high productivity environment, software development for GPUs is time consuming. First, much work needs to be done to restructure software and data organization to match the GPU’s […]

CUDA

•

OpenCL

Aug, 23

Encrypting video and image streams using OpenCL code on-demand

The amount of multimedia information transmitted through the web is very high and increasing. Generally, this kind of data is not correctly protected, since users do not appreciate the amount of information that images and videos may contain. In this work, we present architecture for managing safely multimedia transmission channels. The idea is to encrypt […]

OpenCL

Aug, 23

High Level Programming for Heterogeneous Architectures

This work presents an effort to bridge the gap between abstract high level programming and OpenCL by extending an existing high level Java programming framework (APARAPI), based on OpenCL, so that it can be used to program FPGAs at a high level of abstraction and increased ease of programmability. We run several real world algorithms […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Algorithms for Solving Non-Stationary Heat Conduction Problem for Design of a Technical Device

HSApriori: High Speed Association Rule Mining using Apriori Based Algorithm for GPU

Bandwidth Requirements of GPU Architectures

An Investigation of Unified Memory Access Performance in CUDA

Acceleration of Various Direct/Iterative Solvers for MoM by GPU and Its Computational Cost

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators

GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor

Illustrative Rendering of Particle Systems

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Encrypting video and image streams using OpenCL code on-demand

High Level Programming for Heterogeneous Architectures

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)