high performance computing on graphics processing units: hgpu.org

Posts

Feb, 1

Computational Fluid Dynamic on GPU

Computational Fluid Dynamics, an important branch in HPC field, has a history of seeking and requiring higher computational performance. The traditional way to satisfy this quest is to use faster machines or supercomputers. Yet these approaches seem inconvenient and costly to many individual researchers. We investigated the use of GPU to accelerate CFD codes and […]

CUDA

•

OpenCL

Feb, 1

GPU as a Parallel Machine: Sorting on the GPU

Sorting is a fundamental algorithmic building block. One of the most studied problems in computer science is ordering a list of items efficiently. Buck and Purcell showed how the parallel bitonic merge sort algorithm, could exploit many of the parallel features of the SIMD architecture of the GPU. Efficient sorting has practical importance to optimizing […]

Feb, 1

Introduction to GPU programming for EDA

Advances in GPU technology have propelled the GPU into arenas far afield from the traditional, isolated roles they have previously played. With hundreds of processing units in a single GPU, substantial speedups can be achieved by harnessing their power to augment the performance of the traditional single- or multi-core CPU on certain compute-intensive applications. However, […]

CUDA

•

OpenCL

Feb, 1

Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units

An implementation is presented of an uncontracted Rys quadrature algorithm for electron repulsion integrals, including up to g functions on graphical processing units (GPUs). The general GPU programming model, the challenges associated with implementing the Rys quadrature on these highly parallel emerging architectures, and a new approach to implementing the quadrature are outlined. The performance […]

CUDA

Jan, 31

Towards Automated Learning of Object Detectors

Recognizing arbitrary objects in images or video sequences is a difficult task for a computer vision system. We work towards automated learning of object detectors from video sequences (without user interaction). Our system uses object motion as an important cue to detect independently moving objects in the input sequence. The largest object is always taken […]

OpenGL

Jan, 31

Batched Multi Triangulation

The multi triangulation framework (MT) is a very general approach for managing adaptive resolution in triangle meshes. The key idea is arranging mesh fragments at different resolution in a directed acyclic graph (DAG) which encodes the dependencies between fragments, thereby encompassing a wide class of multiresolution approaches that use hierarchies or DAGs with predefined topology. […]

Jan, 31

Advanced Multi-Frame Rate Rendering Techniques

Multi-frame rate rendering is a parallel rendering technique that renders interactive parts of the scene on one graphics card while the rest of the scene is rendered asynchronously on a second graphics card. The resulting color and depth images of both render processes are composited and displayed. This paper presents advanced multi-frame rate rendering techniques, […]

OpenGL

Jan, 31

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

We develop a new algorithm for isosurface extraction and view-dependent filtering from large time-varying fields, by using a novel persistent time-octree (PTOT) indexing structure. Previously, the persistent octree (POT) was proposed to perform isosurface extraction and view-dependent filtering, which combines the advantages of the interval tree (for optimal searches of active cells) and of the […]

CUDA

Jan, 31

Scientific Computing on Heterogeneous Architectures

The CPU has traditionally been the computational work horse in scientific computing, but we have seen a tremendous increase in the use of accelerators, such as Graphics Processing Units (GPUs), in the last decade. These architectures are used because they consume less power and offer higher performance than equivalent CPU solutions. They are typically also […]

CUDA

Jan, 31

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

This work presents the Open Reconfigurable Computing Language (OpenRCL) system designed to enable low-power high-performance reconfigurable computing with imperative programming language such as C/C++. The key idea is to expose the FPGA platform as a compiler target for applications expressed in the OpenCL paradigm. To this end, we present a combination of low-level virtual machine […]

OpenCL

Jan, 31

Simulation and visualization of the Saint-Venant system using GPUs

We consider three high-resolution schemes for computing shallow-water waves as described by the Saint-Venant system and discuss how to develop highly efficient implementations using graphical processing units (GPUs). The schemes are well-balanced for lake-at-rest problems, handle dry states, and support linear friction models. The first two schemes handle dry states by switching variables in the […]

CUDA

Jan, 31

Highly interactive computational steering for coupled 3D flow problems utilizing multiple GPUs

Most computational fluid dynamics (CFD) simulations require massive computational power which is usually provided by traditional High Performance Computing (HPC) environments. Although interactivity of the simulation process is highly appreciated by scientists and engineers, due to limitations of typical HPC environments, present CFD simulations are usually executed non interactively. A recent trend is to harness […]

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

Computational Fluid Dynamic on GPU

GPU as a Parallel Machine: Sorting on the GPU

Introduction to GPU programming for EDA

Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units

Towards Automated Learning of Object Detectors

Batched Multi Triangulation

Advanced Multi-Frame Rate Rendering Techniques

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

Scientific Computing on Heterogeneous Architectures

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

Simulation and visualization of the Saint-Venant system using GPUs

Highly interactive computational steering for coupled 3D flow problems utilizing multiple GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)