high performance computing on graphics processing units: hgpu.org

Posts

Jun, 4

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready […]

OpenCL

Jun, 4

A Compiler and Runtime for Heterogeneous Computing

Heterogeneous systems show a lot of promise for extracting high-performance by combining the benefits of conventional architectures with specialized accelerators in the form of graphics processors (GPUs) and reconfigurable hardware (FPGAs). Extracting this performance often entails programming in disparate languages and models, making it hard for a programmer to work equally well on all aspects […]

OpenCL

Jun, 4

Finite Element Matrix Generation on a GPU

This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a […]

OpenCL

Jun, 4

Pipelining the Fast Multipole Method over a Runtime System

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of […]

Jun, 4

High Accuracy Gravitational Waveforms from Black Hole Binary Inspirals Using OpenCL

There is a strong need for high-accuracy and efficient modeling of extreme-mass-ratio binary black hole systems because these are strong sources of gravitational waves that would be detected by future observatories. In this article, we present sample results from our Teukolsky EMRI code: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

OpenCL

Jun, 3

GPU Join Processing Revisited

Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back […]

CUDA

Jun, 3

Parallel Triangular Solvers on GPU

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solver and upper triangular solver are developed for this new data structure. With these solvers, […]

CUDA

Jun, 3

Parallel Agent systems on a GPU for use with Simulations and Games

In this paper we describe a parallel agent based computing system. The agents are placed on GPU memory and executed in parallel on the GPU. We discuss the difficulties in creating this system and provide solutions to each of the problems encountered. We then go on to describe a test bed application for the implementation […]

OpenCL

Jun, 3

Reservoir Simulation on NVIDIA Tesla GPUs

In this paper, we introduce our work on accelerating a black oil simulator using GPU-based parallel iterative linear solvers. We develop iterative linear solvers and several commonly used preconditioners on NVIDIA Tesla GPUs. These solvers and preconditioners are coupled with our in-house reservoir simulator. Numerical experiments show that our GPU-based black oil simulator is sped […]

Jun, 3

Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit

We demonstrate an ultrafast displaying Spectral Domain Optical Doppler Tomography system using Graphics Processing Unit (GPU) computing. The calculation of FFT and the Doppler frequency shift is accelerated by the GPU. Our system can display processed OCT and ODT images simultaneously in real time at 120 fps for 1,024 pixels x 512 lateral A-scans. The […]

CUDA

Jun, 1

Towards Distributed Heterogenous High-Performance Computing with ViennaCL

One of the major drawbacks of computing with graphics adapters is the limited available memory for relevant problem sizes. To overcome this limitation for the ViennaCL library, we investigate a partitioning approach for one of the standard benchmark problems in High-Performance Computing (HPC), namely the dense matrix-matrix product. We apply this partitioning approach to problems […]

OpenCL

Jun, 1

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

A Compiler and Runtime for Heterogeneous Computing

Finite Element Matrix Generation on a GPU

Pipelining the Fast Multipole Method over a Runtime System

High Accuracy Gravitational Waveforms from Black Hole Binary Inspirals Using OpenCL

GPU Join Processing Revisited

Parallel Triangular Solvers on GPU

Parallel Agent systems on a GPU for use with Simulations and Games

Reservoir Simulation on NVIDIA Tesla GPUs

Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit

Towards Distributed Heterogenous High-Performance Computing with ViennaCL

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)