high performance computing on graphics processing units: hgpu.org

Posts

Nov, 13

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

A comparative analysis of high-performance implementations of two state of the art index structures that are of particular interest in the field of bioinformatics applications to accelerate the alignment of DNA sequences is presented. The two indexes are based on suffix trees and suffix arrays and were implemented in two different platforms: a quad-core CPU […]

CUDA

Nov, 13

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Image segmentation is the task of splitting a digital image into one or more regions of interest. It is a fundamental problem in computer vision and many different methods, each with their own advantages and disadvantages, exist for the task. Image segmentation is a particularly difficult task for several reasons. Firstly, the ambiguous nature of […]

CUDA

Nov, 13

Efficient GPU Implementation for Particle in Cell Algorithm

Particle in cell (PIC) algorithm is a widely used method in plasma physics to study the trajectories of charged particles under electromagnetic fields. The PIC algorithm is computationally intensive and its time requirements are proportional to the number of charged particles involved in the simulation. The focus of the paper is to parallelize the PIC […]

CUDA

Nov, 13

Molecular Docking on FPGA and GPU Platforms

Molecular docking is an important problem of bioinformatics aiming at the prediction of binding poses of molecules. Auto Dock is a popular, open-source docking software applying a computationally expensive but parallelizable algorithm. This paper introduces an FPGA-based and a GPU-based implementation of Auto Dock and shows how the original algorithm can be effectively accelerated on […]

CUDA

Nov, 12

Efficiently Computing Tensor Eigenvalues on a GPU

The tensor eigenproblem has many important applications, generating both mathematical and application-specific interest in the properties of tensor eigenpairs and methods for computing them. A tensor is an m-way array, generalizing the concept of a matrix (a 2-way array). Kolda and Mayo have recently introduced a generalization of the matrix power method for computing real-valued […]

CUDA

Nov, 12

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable […]

CUDA

Nov, 12

A Run-Time Adaptive FPGA Architecture for Monte Carlo Simulations

Field Programmable Gate Arrays (FPGAs) are now considered to be one of the preferred computing platforms for high performance computing applications, such as Monte Carlo simulations, due to their large computational power and low power consumption. Unlike other state-of-the-art computing platforms, such as General Purpose Processors (GPPs) and General Purpose Graphics Processing Units (GPGPU), FPGAs […]

Nov, 12

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Increasing chip power density has brought application specific accelerator architectures to the forefront as an energy and area efficient solution. While GPGPU systems take advantage of specialized hardware to perform computationally intensive tasks faster than chip multiprocessor (CMP) systems, accelerators are hardware units that are designed to execute a specific application efficiently. Real-time ultrasound imaging […]

CUDA

Nov, 12

A multi-GPU acceleration for 3D imaging of the prostate

Transrectal Electric Impedance Tomography (TREIT) has been proposed jointly with ultrasound (US) imaging of the prostate to enhance the standard clinical imaging. Reconstructing TREIT images involves a solution of an inverse problem. The reconstruction is based on two steps: solving and updating an estimate of the dielectric property distribution through solution of an inverse problem. […]

CUDA

Nov, 12

Sustainable GPU Computing at Scale

General purpose GPU (GPGPU) computing has produced the fastest running supercomputers in the world. For continued sustainable progress, GPU computing at scale also need to address two open issues: a) how increase applications mean time between failures (MTBF) as we increase supercomputer’s component counts, and b) how to minimize unnecessary energy consumption. Since energy consumption […]

CUDA

Nov, 12

Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors

Heterogeneous multicores are envisioned to be a promising design paradigm to combat today’s challenges of power, memory, and reliability walls that are impeding chip design using deep submicron technology. Future multicores are expected to integrate multiple different cores, including GPGPUs, custom accelerators and configurable cores. In this paper, we introduce an important dimension-technology-using which heterogeneity […]

Nov, 12

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration

We have developed GSNP, a software package with GPU acceleration, for single-nucleotide polymorphism detection on DNA sequences generated from second-generation sequencing equipment. Compared with SOAPsnp, a popular, high-performance CPU-based SNP detection tool, GSNP has several distinguishing features: First, we design a sparse data representation format to reduce memory access as well as branch divergence. Second, […]

high performance computing on graphics processing units: hgpu.org

Posts

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Efficient GPU Implementation for Particle in Cell Algorithm

Molecular Docking on FPGA and GPU Platforms

Efficiently Computing Tensor Eigenvalues on a GPU

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

A Run-Time Adaptive FPGA Architecture for Monte Carlo Simulations

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

A multi-GPU acceleration for 3D imaging of the prostate

Sustainable GPU Computing at Scale

Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)