high performance computing on graphics processing units: hgpu.org

Posts

Feb, 22

Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model

The Finite-Difference Time-Domain (FDTD) method is commonly used for electromagnetic field simulations. Recently, successful hardware-accelerations using Graphics Processing Unit (GPU) have been reported for the large-scale FDTD simulations. In this paper, we present a performance analysis of the three-dimensional (3D) FDTD on GPU using the roofline model. We find that theoretical predictions on maximum performance […]

CUDA

Feb, 22

Data Structures and Transformations for Physically Based Simulation on a GPU

As general purpose computing on Graphics Processing Units (GPGPU) matures, more complicated scientific applications are being targeted to utilize the data-level parallelism available on a GPU. Implementing physically-based simulation on data-parallel hardware requires preprocessing overhead which affects application performance. We discuss our implementation of physics-based data structures that provide significant performance improvements when used on […]

CUDA

Feb, 22

Parallel power flow solutions using a biconjugate gradient algorithm and a Newton method: A GPU-based approach

A new approach to solve the power flow problem based on graphic processing units is presented in this paper. A Newton method is implemented to solve the set of nonlinear equations of the power flow formulation. A parallel kernel for the biconjugate gradient method allows solving the voltage corrections on a graphic processing card. While […]

Feb, 22

GPU Acceleration of Runge-Kutta Integrators

We consider the use of commodity graphics processing units (GPUs) for the common task of numerically integrating ordinary differential equations (ODEs), achieving speed-ups of up to 115-fold over comparable serial CPU implementations, and 15-fold over multithreaded CPU code with SIMD intrinsics. Using Lorenz ’96 models as a case study, single and double precision benchmarks are […]

CUDA

Feb, 21

Software-Based ECC for GPUs

Commodity off-the-shelf GPUs lack error checking mechanisms for graphics memory, whereas conventional HPC platforms have used hardware-based ECC for DRAMs. To alleviate this reliability concern, we propose a software-based ECC for GPGPU applications. We add small program codes to normal CUDA programs that compute ECCs for data residing in graphics memory so that transient bit-flips […]

CUDA

Feb, 21

Accelerated Root Finding for Computational Finance

A parallel implementation of root finding on an SIMD application accelerator is reported. These are roots of stochastic differential equations in the computational finance domain which require a stochastic simulation to be performed for each evaluation of the pricing function. Experiments show that a speedup of 15X can be achieved over using a stand-alone CPU […]

Feb, 21

An Automated Approach for SIMD Kernel Generation for GPU based Software Acceleration

Graphics Processing Units (GPUs) are highly parallel Single Instruction Multiple Data (SIMD) engines, with extremely high degrees of available hardware parallelism. The task of implementing a software routine on a GPU currently requires significant manual design, iteration and experimentation. This paper presents an automated approach to partition a software application into kernels (which are executed […]

CUDA

Feb, 21

Assembling large mosaics of electron microscope images using GPU

Understanding the neural circuitry of the retina requires us to map the connectivity of individual neurons in large neuronal tissue sections and analyze signal communication across processes from the electron microscopy images. One of the major bottlenecks in the critical path is the image mosaicing process where 2D slices are assembled from scanned microscopy image […]

CUDA

Feb, 21

Accelerating the Stochastic Simulation Algorithm

In order for scientists to learn more about molecular biology, it is imperative that they have the ability to construct accurate models that predict the reactions of species of molecules. Generating these models using deterministic approaches is not feasible as these models may violate some of the assumptions underlying classical differential equations models (e.g., small […]

CUDA

Feb, 21

Acceleration of Binomial Options Pricing via Parallelizing along time-axis on a GPU

Since the introduction of organized trading of options for commodities and equities, computing fair prices for options has been an important problem in financial engineering. A variety of numerical methods, including Monte Carlo methods, binomial trees, and numerical solution of stochastic differential equations, are used to compute fair prices. Traders and brokerage firms constantly strive […]

CUDA

Feb, 21

A massively parallel framework using P systems and GPUs

Since CUDA programing model appeared on the general purpose computations, the developers can extract all the power contained in GPUs (Graphics Processing Unit) across many computational domains. Among these domains, P systems or membrane systems provide a high level computational modeling framework that allows, in theory, to obtain polynomial time solutions to NP-complete problems by […]

CUDA

Feb, 21

GPU Acceleration of Equations Assembly in Finite Elements Method – Preliminary Results

The finite element method (FEM) is widely used for numerical solution of partial differential equations. Two computationally expensive tasks have to be performed in FEM – equations assembly and solution of the system of equations. We present mapping of the equations assembly problem for StVenant-Kirchhoff material to GPU computation model and show results of its […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model

Data Structures and Transformations for Physically Based Simulation on a GPU

Parallel power flow solutions using a biconjugate gradient algorithm and a Newton method: A GPU-based approach

GPU Acceleration of Runge-Kutta Integrators

Software-Based ECC for GPUs

Accelerated Root Finding for Computational Finance

An Automated Approach for SIMD Kernel Generation for GPU based Software Acceleration

Assembling large mosaics of electron microscope images using GPU

Accelerating the Stochastic Simulation Algorithm

Acceleration of Binomial Options Pricing via Parallelizing along time-axis on a GPU

A massively parallel framework using P systems and GPUs

GPU Acceleration of Equations Assembly in Finite Elements Method – Preliminary Results

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)