high performance computing on graphics processing units: hgpu.org

Posts

Jul, 11

Development of a Restricted Additive Schwarz Preconditioner for Sparse Linear Systems on NVIDIA GPU

In this paper, we develop, study and implement a restricted additive Schwarz (RAS) preconditioner for speedup of the solution of sparse linear systems on NVIDIA Tesla GPU. A novel algorithm for constructing this preconditioner is proposed. This algorithm involves two phases. In the first phase, the construction of the RAS preconditioner is transformed to an […]

CUDA

Jul, 11

Accelerating Preconditioned Iterative Linear Solvers on GPU

Linear systems are required to solve in many scientific applications and the solution of these systems often dominates the total running time. In this paper, we introduce our work on developing parallel linear solvers and preconditioners for solving large sparse linear systems using NVIDIA GPUs. We develop a new sparse matrix-vector multiplication kernel and a […]

CUDA

Jul, 11

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Multiple matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick and Wu-Manber, two of the most well known algorithms for multiple matching require an increased computing power, particularly in cases where large-size datasets must be processed, as is common in computational biology applications. […]

CUDA

Jul, 11

Parallelization of BFS Graph Algorithm using CUDA

Graphs play a very important role in the field of Science and Technology for finding the shortest distance between any two places. This Paper demonstrate the recent technology named as CUDA (Compute Unified Device Architecture) working for BFS Graph Algorithm. There are some Graph algorithms are fundamental to many disciplines and application areas. Large graphs […]

CUDA

Jul, 11

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Rendering methods based on ray tracing provide high image realism, but have been historically regarded as offline only. This has changed in the past decade, due to significant advances in the construction and traversal performance of acceleration structures and the efficient use of data-parallel processing. Today, all major graphics companies offer real-time ray tracing solutions. […]

CUDA

Jul, 11

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

We introduce a hybrid approach for the simulation of fluids based in the Lattice Boltzmann Method for Shallow Waters and particle systems. Our modified LBM Shallow Waters can handle arbitrary underlying terrain and arbitrary fluid depth. It also introduces a novel and simplified method of tracking dry-wet regions. Dynamic rigid bodies are also included in […]

CUDA

Jul, 11

Visualization and Correction of Automated Segmentation, Tracking and Lineaging from 5-D Stem Cell Image Sequences

RESULTS: We present an application that enables the quantitative analysis of multichannel 5-D (x, y, z, t, channel) and large montage confocal fluorescence microscopy images. The image sequences show stem cells together with blood vessels, enabling quantification of the dynamic behaviors of stem cells in relation to their vascular niche, with applications in developmental and […]

CUDA

Jul, 11

Visualization of Large Volumetric Multi-Channel Microscopy Data Streams on Standard PCs

BACKGROUND: Visualization of multi-channel microscopy data plays a vital role in biological research. With the ever-increasing resolution of modern microscopes the data set size of the scanned specimen grows steadily. On commodity hardware this size easily exceeds the available main memory and the even more limited GPU memory. Common volume rendering techniques require the entire […]

OpenCL

Jul, 10

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate parallel executions with minimal developer intervention has been introduced in recent years to tackle the programmability issue while maintaining, or even improving, performance. In this paper, […]

CUDA

Jul, 10

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be […]

Jul, 10

Random Fields Generation on the GPU with the Spectral Turning Bands Method

Random Field (RF) generation algorithms are of paramount importance for many scientific domains, such as astrophysics, geostatistics, computer graphics and many others. Some examples are the generation of initial conditions for cosmological simulations or hydrodynamical turbulence driving. In the latter a new random field is needed every time-step. Current approaches commonly make use of 3D […]

CUDA

•

OpenCL

Jul, 10

Understanding the SIMD Efficiency of Graph Traversal on GPU

Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Development of a Restricted Additive Schwarz Preconditioner for Sparse Linear Systems on NVIDIA GPU

Accelerating Preconditioned Iterative Linear Solvers on GPU

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Parallelization of BFS Graph Algorithm using CUDA

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

Visualization and Correction of Automated Segmentation, Tracking and Lineaging from 5-D Stem Cell Image Sequences

Visualization of Large Volumetric Multi-Channel Microscopy Data Streams on Standard PCs

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

Random Fields Generation on the GPU with the Spectral Turning Bands Method

Understanding the SIMD Efficiency of Graph Traversal on GPU

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)