high performance computing on graphics processing units: hgpu.org

Posts

Nov, 11

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.

Nov, 11

A High Performance Random Number Generator Using Heterogeneous Computing Platform

The power of high performance computing (HPC) heavily depends on the ability to efficiently enhancing huge amounts of parallelism. Random numbers or pseudo random numbers are very important for the efficient implementation for stochastic algorithms. Multi-core CPU and many-core Graphic Processing Units (GPUs) are conductive accelerator to produce the countless random numbers. Nevertheless, GPU does […]

CUDA

Nov, 11

First Steps Towards More Numerical Reproducibility

Questions whether numerical simulation is reproducible or not have been reported in several sensitive applications. Numerical reproducibility failure mainly comes from the finite precision of computer arithmetic. Results of floating-point computation depends on the computer arithmetic precision and on the order of arithmetic operations. Massive parallel HPC which merges, for instance, many-core CPU and GPU, […]

OpenCL

Nov, 10

Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging

Our goal is to develop a complete ultrasound platform based on real-time SAFT (Synthetic Aperture Focusing Technique) GPU processing. We are planning to integrate all the ultrasound modules and processing resources (GPU) in a single rack enclosure with the PCIe switch fabric backplane. The first developed module (RX64) provides acquisition and streaming of 64 ultrasound […]

OpenCL

Nov, 10

Toward Better Computation Models for Modern Machines

Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and a virtual memory. We address the computational cost of the address translation in the virtual memory and difficulties in design of parallel algorithms on modern many-core machines. Starting point for our work on virtual memory is the observation that […]

CUDA

Nov, 10

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL

CUDA(Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which makes users develop general GPU (Graphics Processing Unit) programs easily. GPUs are emerging as platform of choice for Parallel High Performance Computing. GPUs are good at data intensive parallel processing with availability of software development platforms such as CUDA (developed […]

CUDA

•

OpenGL

Nov, 10

Implementation of Spectral Angle Mapper (SAM) Algorithm on a Graphic processing unit (GPU)

The Need for Hyper spectral Images for Exploration of Oil and Other Minerals are so massive. We can tap the high computational power available now for faster tracking of those minerals underneath. In this paper, we Implement an Algorithm called Spectral angle mapper(SAM) using compute unified device architecture(CUDA) framework on a GPU. The SAM algorithm […]

CUDA

Nov, 10

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on […]

OpenCL

Nov, 8

Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System

There can be many types of heterogeneous computing systems, and the most useful one is the CPU and GPU computing system. In this system, we try to run QR decomposition, which expresses a standard real matrix as a production of two matrices. For a tiled QR decomposition algorithm, which is a parallelized version of QR […]

CUDA

Nov, 8

GPU-Based Space-Time Adaptive Processing (STAP) for Radar

Space-time adaptive processing (STAP) utilizes a two-dimensional adaptive filter to detect targets within a radar data set with speeds similar to the background clutter. While adaptively optimal solutions exist, they are prohibitively computationally intensive. Thus, researchers have developed alternative algorithms with nearly optimal filtering performance and greatly reduced computational intensity. While such alternatives reduce the […]

CUDA

Nov, 8

Accelerating a Novel Particle-based Fluid Simulation on the GPU

Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. […]

CUDA

Nov, 8

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key […]

high performance computing on graphics processing units: hgpu.org

Posts

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

A High Performance Random Number Generator Using Heterogeneous Computing Platform

First Steps Towards More Numerical Reproducibility

Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging

Toward Better Computation Models for Modern Machines

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL

Implementation of Spectral Angle Mapper (SAM) Algorithm on a Graphic processing unit (GPU)

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System

GPU-Based Space-Time Adaptive Processing (STAP) for Radar

Accelerating a Novel Particle-based Fluid Simulation on the GPU

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)