high performance computing on graphics processing units: hgpu.org

Posts

Jul, 12

SPH Fluids for Viscous Jet Buckling

We present a novel meshfree technique for animating free surface viscous liquids with jet buckling effects, such as coiling and folding. Our technique is based on Smoothed Particle Hydrodynamics (SPH) fluids and allows more realistic and complex viscous behaviors than the preceding SPH frameworks in computer animation literature. The viscous liquid is modeled by a […]

CUDA

Jul, 12

Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture

We have presented several contributions on the collision detection optimization centered on hardware performance. We focus on the first step (Broad-phase) and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute […]

CUDA

Jul, 12

Parallelized Hierarchical Expected Matching Probability for Multiple Sequence Alignment

Sequence alignment of two or more than two biological sequences such as protein, DNA (Deoxyribonucleic acid) or RNA (Ribonucleic acid) is called MSA (Multiple Sequence Alignment). Sequence homology can be inferred from the resulting MSA. Existing System uses dynamic programming technique which suffers from exponential growth of time as the sequence grows. A Hierarchical Expected […]

CUDA

Jul, 12

Using the GPU for Fast Symmetry-Based Dense Stereo Matching in High Resolution Images

SymStereo is a new algorithm used for stereo estimation. Instead of measuring photo-similarity, it proposes novel cost functions that measure symmetry for evaluating the likelihood of two pixels being a match. In this work we propose a parallel approach of the LogN matching cost variant of SymStereo capable of processing pairs of images in real-time […]

CUDA

Jul, 11

Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems

In earlier times, computer systems had only a single core or processor. In these computers, the number of transistors on-chip (i.e. on the processor) doubled every two years and all applications enjoyed free speedup. Subsequently, with more and more transistors being packed on-chip, power consumption became an issue, frequency scaling reached its limits and industry […]

CUDA

Jul, 11

Programming-Model Centric Debugging for Multicore Embedded Systems

In this thesis, we propose to study interactive debugging of applications running on embedded systems Multi-Processor System on Chip (MPSoC). A literature study showed that nowadays, the design and development of these applications rely more and more on programming models and development frameworks. These environments gather established algorithmic and programming good-practices, and hence speed up […]

CUDA

•

OpenCL

Jul, 11

Development of a Restricted Additive Schwarz Preconditioner for Sparse Linear Systems on NVIDIA GPU

In this paper, we develop, study and implement a restricted additive Schwarz (RAS) preconditioner for speedup of the solution of sparse linear systems on NVIDIA Tesla GPU. A novel algorithm for constructing this preconditioner is proposed. This algorithm involves two phases. In the first phase, the construction of the RAS preconditioner is transformed to an […]

CUDA

Jul, 11

Accelerating Preconditioned Iterative Linear Solvers on GPU

Linear systems are required to solve in many scientific applications and the solution of these systems often dominates the total running time. In this paper, we introduce our work on developing parallel linear solvers and preconditioners for solving large sparse linear systems using NVIDIA GPUs. We develop a new sparse matrix-vector multiplication kernel and a […]

CUDA

Jul, 11

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Multiple matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick and Wu-Manber, two of the most well known algorithms for multiple matching require an increased computing power, particularly in cases where large-size datasets must be processed, as is common in computational biology applications. […]

CUDA

Jul, 11

Parallelization of BFS Graph Algorithm using CUDA

Graphs play a very important role in the field of Science and Technology for finding the shortest distance between any two places. This Paper demonstrate the recent technology named as CUDA (Compute Unified Device Architecture) working for BFS Graph Algorithm. There are some Graph algorithms are fundamental to many disciplines and application areas. Large graphs […]

CUDA

Jul, 11

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Rendering methods based on ray tracing provide high image realism, but have been historically regarded as offline only. This has changed in the past decade, due to significant advances in the construction and traversal performance of acceleration structures and the efficient use of data-parallel processing. Today, all major graphics companies offer real-time ray tracing solutions. […]

CUDA

Jul, 11

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

We introduce a hybrid approach for the simulation of fluids based in the Lattice Boltzmann Method for Shallow Waters and particle systems. Our modified LBM Shallow Waters can handle arbitrary underlying terrain and arbitrary fluid depth. It also introduces a novel and simplified method of tracking dry-wet regions. Dynamic rigid bodies are also included in […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

SPH Fluids for Viscous Jet Buckling

Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture

Parallelized Hierarchical Expected Matching Probability for Multiple Sequence Alignment

Using the GPU for Fast Symmetry-Based Dense Stereo Matching in High Resolution Images

Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems

Programming-Model Centric Debugging for Multicore Embedded Systems

Development of a Restricted Additive Schwarz Preconditioner for Sparse Linear Systems on NVIDIA GPU

Accelerating Preconditioned Iterative Linear Solvers on GPU

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Parallelization of BFS Graph Algorithm using CUDA

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)