high performance computing on graphics processing units: hgpu.org

Posts

Jul, 29

Implicit Methods for Real-Time simulation of Interactive Waves

The project focuses on developing a simulator in which ships and waves interact. The new wave model is the Variational Boussinesq model (VBM). However, this new realistic model brings much more computation effort with it. The VBM mainly requires an unsteady state solver, that solves a coupled system of equations at each frame (20 fps). […]

CUDA

Jul, 29

Parallel Worldline Numerics: Implementation and Error Analysis

We give an overview of the worldline numerics technique, and discuss the parallel CUDA implementation of a worldline numerics algorithm. In the worldline numerics technique, we wish to generate an ensemble of representative closed-loop particle trajectories, and use these to compute an approximate average value for Wilson loops. We show how this can be done […]

CUDA

Jul, 29

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]

CUDA

Jul, 29

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an […]

OpenCL

Jul, 29

Course on Antenna Synthesis (with elements of GPU computing)

I’m pleased to announce the Course on Antenna Synthesis (with elements of GPU computing) organized in the framework of the European School of Antennas. The Course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. The Course faces three topics corresponding to the […]

Jul, 29

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

The key innovation brought about by Intel Xeon Phi coprocessors is the possibility to port most HPC applications to manycore computing accelerators without code modification. One of the reasons why this is possible is support for file input/output (I/O) directly from applications running on coprocessors. These facilities allow seamless usage of manycore accelerators in common […]

Jul, 28

GPU Computing to Improve Game Engine Performance

Although the graphics processing unit (GPU) was originally designed to accelerate the image creation for output to display, today’s general purpose GPU (GPGPU) computing offers unprecedented performance by offloading computing-intensive portions of the application to the GPGPU, while running the remainder of the code on the central processing unit (CPU). The highly parallel structure of […]

CUDA

Jul, 28

Computational investigation of intense short-wavelength laser interaction with rare gas clusters

Clusters of atoms have remarkable optical properties that were exploited since the antiquity. It was only during the late 20th century though that their production was better controlled and opened the door to a better understanding of matter. Lasers are the tool of choice to study these nanoscopic objects so scientists have been blowing clusters […]

OpenCL

Jul, 28

Ship Detection from SAR Imagery Using CUDA and Performance Analysis of the System

Synthetic aperture radar (SAR) Ship Detection System SDS is an important application from the point of view of Maritime Security monitoring. It allows monitoring traffic, fisheries, naval warfare. Since full-resolution SAR images are heavily affected by the presence of speckle, ship detection algorithms generally employ speckle reduced SAR images at the expense of a degradation […]

CUDA

Jul, 28

Bayesian model comparison via sequential Monte Carlo

The sequential Monte Carlo (smc) methods have been widely used for modern scientific computation. Bayesian model comparison has been successfully applied in many fields. Yet there have been few researches on the use of smc for the purpose of Bayesian model comparison. This thesis studies different smc strategies for Bayesian model computation. In addition, various […]

OpenCL

Jul, 28

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

High-performance computing are based more and more in heterogeneous architectures and GPGPUs have become one of the main integrated blocks in these, as the recently emerged Mali GPU in embedded systems or the NVIDIA GPUs in HPC servers. In both GPGPUs, programming could become a hurdle that can limit their adoption, since the programmer has […]

CUDA

Jul, 28

Understanding the ISA impact on GPU Architecture

The wide spread acceptance of GPU for parallel computation has created the demand for general purpose capabilities in GPU. In response, Industry is coming up rapidly with better architecture to support general purpose processing on GPUs. NVIDIA has come up with Tesla, Fermi and Kepler architecture. General Purpose Graphics Processing Units (GPGPU) are widely being […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implicit Methods for Real-Time simulation of Interactive Waves

Parallel Worldline Numerics: Implementation and Error Analysis

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

Course on Antenna Synthesis (with elements of GPU computing)

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

GPU Computing to Improve Game Engine Performance

Computational investigation of intense short-wavelength laser interaction with rare gas clusters

Ship Detection from SAR Imagery Using CUDA and Performance Analysis of the System

Bayesian model comparison via sequential Monte Carlo

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

Understanding the ISA impact on GPU Architecture

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)