high performance computing on graphics processing units: hgpu.org

Posts

Nov, 14

Extinction-Based Shading and Illumination in GPU Volume Ray-Casting

Direct volume rendering has become a popular method for visualizing volumetric datasets. Even though computers are continually getting faster, it remains a challenge to incorporate sophisticated illumination models into direct volume rendering while maintaining interactive frame rates. In this paper, we present a novel approach for advanced illumination in direct volume rendering based on GPU […]

CUDA

Nov, 13

Comprehensive Performance Monitoring for GPU Cluster Systems

Accelerating applications with GPUs has recently garnered a lot of interest from the scientific computing community. While tools for optimizing individual kernels are readily available, there is a lack of support for the specific needs of the HPC area. Most importantly, integration with existing parallel programming models (MPI and threading) and scalability to the full […]

CUDA

Nov, 13

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

A comparative analysis of high-performance implementations of two state of the art index structures that are of particular interest in the field of bioinformatics applications to accelerate the alignment of DNA sequences is presented. The two indexes are based on suffix trees and suffix arrays and were implemented in two different platforms: a quad-core CPU […]

CUDA

Nov, 13

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Image segmentation is the task of splitting a digital image into one or more regions of interest. It is a fundamental problem in computer vision and many different methods, each with their own advantages and disadvantages, exist for the task. Image segmentation is a particularly difficult task for several reasons. Firstly, the ambiguous nature of […]

CUDA

Nov, 13

Efficient GPU Implementation for Particle in Cell Algorithm

Particle in cell (PIC) algorithm is a widely used method in plasma physics to study the trajectories of charged particles under electromagnetic fields. The PIC algorithm is computationally intensive and its time requirements are proportional to the number of charged particles involved in the simulation. The focus of the paper is to parallelize the PIC […]

CUDA

Nov, 13

Molecular Docking on FPGA and GPU Platforms

Molecular docking is an important problem of bioinformatics aiming at the prediction of binding poses of molecules. Auto Dock is a popular, open-source docking software applying a computationally expensive but parallelizable algorithm. This paper introduces an FPGA-based and a GPU-based implementation of Auto Dock and shows how the original algorithm can be effectively accelerated on […]

CUDA

Nov, 12

Efficiently Computing Tensor Eigenvalues on a GPU

The tensor eigenproblem has many important applications, generating both mathematical and application-specific interest in the properties of tensor eigenpairs and methods for computing them. A tensor is an m-way array, generalizing the concept of a matrix (a 2-way array). Kolda and Mayo have recently introduced a generalization of the matrix power method for computing real-valued […]

CUDA

Nov, 12

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable […]

CUDA

Nov, 12

A Run-Time Adaptive FPGA Architecture for Monte Carlo Simulations

Field Programmable Gate Arrays (FPGAs) are now considered to be one of the preferred computing platforms for high performance computing applications, such as Monte Carlo simulations, due to their large computational power and low power consumption. Unlike other state-of-the-art computing platforms, such as General Purpose Processors (GPPs) and General Purpose Graphics Processing Units (GPGPU), FPGAs […]

Nov, 12

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Increasing chip power density has brought application specific accelerator architectures to the forefront as an energy and area efficient solution. While GPGPU systems take advantage of specialized hardware to perform computationally intensive tasks faster than chip multiprocessor (CMP) systems, accelerators are hardware units that are designed to execute a specific application efficiently. Real-time ultrasound imaging […]

CUDA

Nov, 12

A multi-GPU acceleration for 3D imaging of the prostate

Transrectal Electric Impedance Tomography (TREIT) has been proposed jointly with ultrasound (US) imaging of the prostate to enhance the standard clinical imaging. Reconstructing TREIT images involves a solution of an inverse problem. The reconstruction is based on two steps: solving and updating an estimate of the dielectric property distribution through solution of an inverse problem. […]

CUDA

Nov, 12

Sustainable GPU Computing at Scale

General purpose GPU (GPGPU) computing has produced the fastest running supercomputers in the world. For continued sustainable progress, GPU computing at scale also need to address two open issues: a) how increase applications mean time between failures (MTBF) as we increase supercomputer’s component counts, and b) how to minimize unnecessary energy consumption. Since energy consumption […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Extinction-Based Shading and Illumination in GPU Volume Ray-Casting

Comprehensive Performance Monitoring for GPU Cluster Systems

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Efficient GPU Implementation for Particle in Cell Algorithm

Molecular Docking on FPGA and GPU Platforms

Efficiently Computing Tensor Eigenvalues on a GPU

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

A Run-Time Adaptive FPGA Architecture for Monte Carlo Simulations

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

A multi-GPU acceleration for 3D imaging of the prostate

Sustainable GPU Computing at Scale

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)