high performance computing on graphics processing units: hgpu.org

Posts

May, 15

Parallelization of Shape Diameter Function Computation using OpenCL

Shape Diameter Function (SDF) is a scalar function that expresses a measure of the diameter of the object’s volume in the neighborhood of each point on the surface on an input mesh. It is fundamental in many applications in computer graphics used for consistent mesh partitioning and skeletonization. The algorithm sends several rays inside a […]

OpenCL

May, 15

Performance Optimization of GPU ELF-Codes

GPUs (Graphic Processing Units) are of interest for their favorable ratio GF/s/price. Compared to the beginning – early 1980’s – nowadays GPU architectures are more similar to general purpose architectures but with (much) larger numbers of cores – the GF100 architecture released by NVIDIA in 2009-2010, for example, has a true hardware cache hierarchy, a […]

CUDA

May, 15

Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers

In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software components, skeletons (i.e., predefined generic components) and containers, and discuss the optimization problems that need to be considered in selecting among multiple implementation variants, generating […]

CUDA

•

OpenCL

May, 14

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Kepler is the newest GPU architecture from NVIDIA, and the GTX 680 is the first commercially available graphics card based on that architecture. Matrix multiplication is a canonical computational kernel, and often the main target of initial optimization efforts for a new chip. This article presents preliminary results of automatically tuning matrix multiplication kernels for […]

CUDA

May, 14

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

In this paper, the adaptability of the neutron diffusion numerical algorithm on GPUs was studied, and a GPUaccelerated multi-group 3D neutron diffusion code based on finite difference method was developed. The IAEA 3D PWR benchmark problem was calculated in the numerical test. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation […]

CUDA

May, 14

Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm

We first present two GPU implementations of the standard Inverse Distance Weighting (IDW) interpolation algorithm, the tiled version that takes advantage of shared memory and the CDP version that is implemented using CUDA Dynamic Parallelism (CDP). Then we evaluate the power of GPU acceleration for IDW interpolation algorithm by comparing the performance of CPU implementation […]

CUDA

May, 14

Build and Travel KD-Tree with CUDA

Ray tracing is an important and widely used tool in computer graphic. Entertainment and game industry have already benefit a lot from ray tracing. However, designers and end-users are forced to use off-line ray tracing tools for a long time due to the high computation load. In ray tracing, most of the computation is concentrated […]

CUDA

May, 14

Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations

We implement an efficient energy-minimization algorithm for finite-difference micromagnetics that proofs especially usefull for the computation of hysteresis loops. Compared to results obtained by time integration of the Landau-Lifshitz-Gilbert equation, a speedup of up to two orders of magnitude is gained. The method is implemented in a finite-difference code running on CPUs as well as […]

CUDA

May, 13

Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: – Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; – Tuning the inter-operation of OpenMP and […]

May, 13

Fast Finite Solar Radiation Pressure Model Integration Using OpenGL

By coupling a common approach to vector graphics, OpenGL, high-fidelity solar-radiation pressure (SRP) effects are calculated easily and quickly with the power of graphics processing units (GPUs). For some missions SRP is a significant perturbation and a consideration wherein a simplified plate model does not suffice. OpenGL is a set of commands that interact with […]

OpenGL

May, 13

Impact of Modern OpenGL on FPS

In our work we choose several old and modern features of OpenGL that applications use to render scenes and compare their impact on the rendering speed. We aim our comparison not solely on these features, but also on the type of hardware used for the measurements. We run our tests on a professional graphics card […]

OpenGL

May, 13

Deriving Shape Grammars on the GPU

Due to growing demand for computer generated graphical content, procedural modeling has become an important topic in the gaming and movie industry. Creating vast amounts of content by hand requires excessive amounts of manual labor. Using a procedural rule set, entire worlds can be generated by a computer. However, the traditional CPU-based derivation of a […]

CUDA

•

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallelization of Shape Diameter Function Computation using OpenCL

Performance Optimization of GPU ELF-Codes

Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm

Build and Travel KD-Tree with CUDA

Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations

Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture

Fast Finite Solar Radiation Pressure Model Integration Using OpenGL

Impact of Modern OpenGL on FPS

Deriving Shape Grammars on the GPU

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)