high performance computing on graphics processing units: hgpu.org

Posts

Aug, 10

GPGPU Based Aeroacoustic Optimization of a Contra-Rotating Fan

Contra-rotating fans have several advantages over single stage axial fans. If they are well designed, the exit flow field is almost irrotational. This helps to increase the aerodynamic efficiency by up to 16%, when compared to single stage fans. However, since the second stage interacts with the flow disturbances from the first stage, the associated […]

OpenCL

Aug, 10

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

In the process of image segmentation, the classic Fuzzy C-Means (FCM) algorithm is time-consuming and depends heavily on initialization center. Based on Graphic Processing Unit (GPU), this paper proposes a novel FCM algorithm by improving the computational formulas of membership degree and the update criterion of cluster centers. Our algorithm can initialize cluster centers purposefully […]

CUDA

Aug, 10

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

MOTIVATION: In computing pairwise alignments of biological sequences, software implementations employ a variety of heuristics that decrease the computational effort involved in computing potential alignments. A key element in achieving high processing throughput is to identify and prioritize potential alignments where high-scoring mappings can be expected. These tasks involve list-processing operations that can be efficiently […]

CUDA

Aug, 9

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP’s problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages […]

CUDA

Aug, 9

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and […]

CUDA

Aug, 9

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very different properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. […]

CUDA

Aug, 9

Parallel Distributed Breadth First Search on the Kepler Architecture

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the […]

CUDA

Aug, 9

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. We investigate a parallel version of the approximate K-SVD algorithm, where multiple atoms are updated simultaneously, and implement it using OpenCL, for execution on graphics processing units (GPU). […]

OpenCL

Aug, 7

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

The target of this thesis is to optimize memory management on heterogeneous systems. Our approach involves performing memory access pattern analysis on kernels in order to produce an accurate estimation of the memory usage. This information is produced in the form of array ranges describing which elements are accessed as well as whether they are […]

OpenCL

Aug, 7

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Preventing users from accessing adult videos and at the same time allowing them to access good educational videos and other materials through campus wide network is a big challenge for organizations. Major existing web filtering systems are textual content or link analysis based. As a result, potential users cannot access qualitative and informative video content […]

CUDA

Aug, 7

Dense Arithmetic over Finite Fields with the CUMODP Library

CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multi-point evaluation, interpolation and many others are provided. These routines are primarily designed to offer GPU support to polynomial system solvers and a bivariate system solver is part of the library. […]

CUDA

Aug, 7

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

In some application domains, using a Multi-Agent Systems (MAS) modeling approach may require to handle a large number of agents (crowds, traffic, animal societies, ecosystems, etc.). Today, as this number is constantly growing, the computational resources which are needed cannot be fulfilled by the CPU of single Personal Computers (PC) any more. Considering this issue, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPGPU Based Aeroacoustic Optimization of a Contra-Rotating Fan

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Parallel Distributed Breadth First Search on the Kepler Architecture

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Dense Arithmetic over Finite Fields with the CUMODP Library

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)