high performance computing on graphics processing units: hgpu.org

Posts

Sep, 20

Comprehensive Evaluations of Cone-beam CT dose in Image-guided Radiation Therapy via GPU-based Monte Carlo simulations

Cone beam CT (CBCT) has been widely used for patient setup in image guided radiation therapy (IGRT). Radiation dose from CBCT scans has become a clinical concern. The purposes of this study are 1) to commission a GPU-based Monte Carlo (MC) dose calculation package gCTD for Varian On-Board Imaging (OBI) system and test the calculation […]

CUDA

Sep, 20

The Security of Key Derivation Functions in WINRAR

In various versions of WINRAR, the file security is mainly protected by user authentication and files encryption. Password based key derivation function (PBKDF) is the core of the WINRAR security mechanism. In this paper, the security of PBKDF algorithm and the encrypted file in WINRAR are analyzed by the Game-Playing approach. We show the upper […]

OpenCL

Sep, 20

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent […]

OpenCL

Sep, 20

Rethinking the Union of Computed Tomography Reconstruction and GPGPU Computing

This work will present the utilization of the massively multi-threaded environment of graphics processors (GPUs) to improve the computation time needed to reconstruct large computed tomography (CT) datasets and the arising challenges for system implementation. Intelligent algorithm design for massively multi-threaded graphics processors differs greatly from traditional CPU algorithm design. Although a brute force port […]

CUDA

Sep, 20

Performance analysis of multi-core CPUs and GPU computing on SF-FDTD scheme for third order nonlinear materials and periodic media

The Split-Field Finite-Difference Time-Domain (SF-FDTD) scheme is an optimal formulation for modeling periodic optical media by means of a single unit period. The split-field components and the Periodic Boundary Condition (BPC) in the periodic boundaries allow to obtain successful results even with oblique angle of incidence. Under this situation the standard FDTD scheme requires multiple […]

CUDA

Sep, 20

Exponential Integrators on Graphics Processing Units

In this paper we revisit stencil methods on GPUs in the context of exponential integrators. We further discuss boundary conditions, in the same context, and show that simple boundary conditions (for example, homogeneous Dirichlet or homogeneous Neumann boundary conditions) do not affect the performance if implemented directly into the CUDA kernel. In addition, we show […]

CUDA

Sep, 18

Adjustable GPU Acceleration for Hermitian Eigensystems

This paper explores the early implementation of high-performance routines for the solution of multiple large Hermitian eigenvector and eigenvalue systems on a Graphics Processing Unit (GPU). We report a performance increase of up to two orders of magnitude over the original EISPACK routines with a NVIDIA Tesla C2050 GPU, potentially allowing an order of magnitude […]

CUDA

Sep, 18

Sparse Matrix Algorithms Using GPGPU

The purpose of this thesis was to benchmark and compare different representations of sparse matrices and algorithms for multiplying them with a vector. Also, to see the performance differences of running the algorithms on a CPU and GPU(s). Four different storage formats were tested – full matrix storage, coordinate storage (COO), ELLPACK (ELL), compressed sparse […]

OpenCL

Sep, 18

Acceleration of recovery simulation on big model using GPU

Software that calculate different scenarios of field development play important role in petroleum industry. Increasing number of cells in the simulation grid significantly slows down the calculations. In order to obtain accuracy results it is necessary to spend a lot of time for the simulations (days or weeks) or use expensive high-performance systems or supercomputers. […]

CUDA

Sep, 18

A GPU-based Parallel Procedure for Nonlinear Analysis of Complex Structures Using a Coupled FEM/DEM Approach

This study reports the GPU parallelization of complex three-dimensional software for nonlinear analysis of concrete structures. It focuses on coupled thermo-mechanical analysis of complex structures. A coupled FEM/DEM approach (CDEM) is given from a fundamental theoretical viewpoint. As the modeling of a large structure by means of FEM/DEM may lead to prohibitive computation times, a […]

CUDA

Sep, 18

A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ideally one would like to do thousand-year long simulations, but the current performance of POP prohibits this type of simulations. In this work, using a new distributed computing approach, two innovations to improve the performance of POP are presented. The first […]

CUDA

Sep, 17

Parallel Motion Estimation Implementation for Different Block Matching Algorithms onto GPGPU

This work presents an efficient method to map Motion Estimation (ME) algorithms onto General Purpose Graphic Processing Unit (GPGPU) architectures using CUDA programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelization potential of ME algorithms: Full Search (FS) and Diamond Search (DS). Our main goal is to […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comprehensive Evaluations of Cone-beam CT dose in Image-guided Radiation Therapy via GPU-based Monte Carlo simulations

The Security of Key Derivation Functions in WINRAR

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Rethinking the Union of Computed Tomography Reconstruction and GPGPU Computing

Performance analysis of multi-core CPUs and GPU computing on SF-FDTD scheme for third order nonlinear materials and periodic media

Exponential Integrators on Graphics Processing Units

Adjustable GPU Acceleration for Hermitian Eigensystems

Sparse Matrix Algorithms Using GPGPU

Acceleration of recovery simulation on big model using GPU

A GPU-based Parallel Procedure for Nonlinear Analysis of Complex Structures Using a Coupled FEM/DEM Approach

A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

Parallel Motion Estimation Implementation for Different Block Matching Algorithms onto GPGPU

Recent source codes

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

Most viewed papers (last 30 days)