high performance computing on graphics processing units: hgpu.org

Posts

May, 14

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Kepler is the newest GPU architecture from NVIDIA, and the GTX 680 is the first commercially available graphics card based on that architecture. Matrix multiplication is a canonical computational kernel, and often the main target of initial optimization efforts for a new chip. This article presents preliminary results of automatically tuning matrix multiplication kernels for […]

CUDA

May, 14

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

In this paper, the adaptability of the neutron diffusion numerical algorithm on GPUs was studied, and a GPUaccelerated multi-group 3D neutron diffusion code based on finite difference method was developed. The IAEA 3D PWR benchmark problem was calculated in the numerical test. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation […]

CUDA

May, 14

Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm

We first present two GPU implementations of the standard Inverse Distance Weighting (IDW) interpolation algorithm, the tiled version that takes advantage of shared memory and the CDP version that is implemented using CUDA Dynamic Parallelism (CDP). Then we evaluate the power of GPU acceleration for IDW interpolation algorithm by comparing the performance of CPU implementation […]

CUDA

May, 14

Build and Travel KD-Tree with CUDA

Ray tracing is an important and widely used tool in computer graphic. Entertainment and game industry have already benefit a lot from ray tracing. However, designers and end-users are forced to use off-line ray tracing tools for a long time due to the high computation load. In ray tracing, most of the computation is concentrated […]

CUDA

May, 14

Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations

We implement an efficient energy-minimization algorithm for finite-difference micromagnetics that proofs especially usefull for the computation of hysteresis loops. Compared to results obtained by time integration of the Landau-Lifshitz-Gilbert equation, a speedup of up to two orders of magnitude is gained. The method is implemented in a finite-difference code running on CPUs as well as […]

CUDA

May, 13

Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: – Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; – Tuning the inter-operation of OpenMP and […]

May, 13

Fast Finite Solar Radiation Pressure Model Integration Using OpenGL

By coupling a common approach to vector graphics, OpenGL, high-fidelity solar-radiation pressure (SRP) effects are calculated easily and quickly with the power of graphics processing units (GPUs). For some missions SRP is a significant perturbation and a consideration wherein a simplified plate model does not suffice. OpenGL is a set of commands that interact with […]

OpenGL

May, 13

Impact of Modern OpenGL on FPS

In our work we choose several old and modern features of OpenGL that applications use to render scenes and compare their impact on the rendering speed. We aim our comparison not solely on these features, but also on the type of hardware used for the measurements. We run our tests on a professional graphics card […]

OpenGL

May, 13

Deriving Shape Grammars on the GPU

Due to growing demand for computer generated graphical content, procedural modeling has become an important topic in the gaming and movie industry. Creating vast amounts of content by hand requires excessive amounts of manual labor. Using a procedural rule set, entire worlds can be generated by a computer. However, the traditional CPU-based derivation of a […]

CUDA

•

OpenGL

May, 13

K-Means on GPU: A Review

K-Means is the most popular clustering algorithm in data mining. The size of various data sets has increased tremendously day by day. Due to recent development in the shared memory inexpensive architecture like Graphics Processing Units (GPU). The general – purpose applications are implemented on GPU using Compute Unified Device Architecture (CUDA). Cost effectiveness of […]

CUDA

May, 13

Performance Analysis of Sobel Edge Filter on Heterogeneous System Using OpenCL

The fundamental task required for any image or Video processing applications like video surveillance, medical imaging is Edge detection. Any of the filters available can be used to detect the edges. In this paper Sobel Edge filter is used for comparing the performance analysis on CPUs and GPUs and from this study it is found […]

OpenCL

May, 12

Geometric Algebra Enhanced Precompiler for C++, OpenCL and Mathematica’s OpenCLLink

The focus of this work is a simplified integration of algorithms expressed in Geometric Algebra (GA) into modern high level computer languages, namely C++, OpenCL and CUDA. A high runtime performance in terms of GA is achieved using symbolic simplification and code generation by a precompiler that is directly integrated into CMake-based build toolchains. Finally, […]

CUDA

•

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm

Build and Travel KD-Tree with CUDA

Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations

Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture

Fast Finite Solar Radiation Pressure Model Integration Using OpenGL

Impact of Modern OpenGL on FPS

Deriving Shape Grammars on the GPU

K-Means on GPU: A Review

Performance Analysis of Sobel Edge Filter on Heterogeneous System Using OpenCL

Geometric Algebra Enhanced Precompiler for C++, OpenCL and Mathematica’s OpenCLLink

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)