high performance computing on graphics processing units: hgpu.org

Posts

Aug, 11

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input […]

CUDA

Aug, 11

UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture

In recent years, Unmanned Aerial Vehicles (UAVs) are emerged as an attractive technology for different types of military and civil applications which have gained importance in academic researches. In these emerging research areas, UAV autonomy gets a great part and mainly it refers the ability for automatic take-off, landing and path planning of UAVs. In […]

CUDA

Aug, 11

Parallel Breadth First Search on GPU Clusters

Fast, scalable, low-cost, and low-power execution of parallel graph algorithms is important for a wide variety of commercial and public sector applications. Breadth First Search (BFS) imposes an extreme burden on memory bandwidth and network communications and has been proposed as a benchmark that may be used to evaluate current and future parallel computers. Hardware […]

CUDA

Aug, 10

Optimising Purely Functional GPU Programs (Thesis)

It is well acknowledged that the dominant mechanism for scaling processor performance has become to increase the number of cores on a chip, rather than improve the performance of a single core. However, harnessing these extra cores to improve single application performance remains an extremely challenging task. A recent trend has been to use commodity […]

CUDA

Aug, 10

Direct Numerical Simulation of Turbulent Flows with Parallel Algorithms for Various Computing Architectures

The purpose of the work is twofold. Firstly, it is devoted to the development of efficient parallel algorithms for large-scale simulations of turbulent flows on different supercomputer architectures. It reports experience with massively-parallel accelerators including graphics processing units of AMD and NVIDIA and Intel Xeon Phi coprocessors. Secondly, it introduces new series of direct numerical […]

OpenCL

Aug, 10

GPGPU Based Aeroacoustic Optimization of a Contra-Rotating Fan

Contra-rotating fans have several advantages over single stage axial fans. If they are well designed, the exit flow field is almost irrotational. This helps to increase the aerodynamic efficiency by up to 16%, when compared to single stage fans. However, since the second stage interacts with the flow disturbances from the first stage, the associated […]

OpenCL

Aug, 10

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

In the process of image segmentation, the classic Fuzzy C-Means (FCM) algorithm is time-consuming and depends heavily on initialization center. Based on Graphic Processing Unit (GPU), this paper proposes a novel FCM algorithm by improving the computational formulas of membership degree and the update criterion of cluster centers. Our algorithm can initialize cluster centers purposefully […]

CUDA

Aug, 10

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

MOTIVATION: In computing pairwise alignments of biological sequences, software implementations employ a variety of heuristics that decrease the computational effort involved in computing potential alignments. A key element in achieving high processing throughput is to identify and prioritize potential alignments where high-scoring mappings can be expected. These tasks involve list-processing operations that can be efficiently […]

CUDA

Aug, 9

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP’s problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages […]

CUDA

Aug, 9

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and […]

CUDA

Aug, 9

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very different properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. […]

CUDA

Aug, 9

Parallel Distributed Breadth First Search on the Kepler Architecture

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture

Parallel Breadth First Search on GPU Clusters

Optimising Purely Functional GPU Programs (Thesis)

Direct Numerical Simulation of Turbulent Flows with Parallel Algorithms for Various Computing Architectures

GPGPU Based Aeroacoustic Optimization of a Contra-Rotating Fan

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Parallel Distributed Breadth First Search on the Kepler Architecture

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)