high performance computing on graphics processing units: hgpu.org

Posts

Feb, 12

Exploiting Multiple Levels of Parallelism and Online Refinement of Unstructured Meshes in Atmospheric Model Application

Weather forecasts for long periods of time has emerged as increasingly important. The global concern with the consequences of climate changes has stimulated researches to determine the climate in coming decades. At the same time the steps needed to better defining the modeling and the simulation of climate/weather is far of the desired accuracy. Upscaling […]

CUDA

Feb, 12

The Implement of Common Beam Forming Using GPU

In order to study how to use GPU in real-time signal processing system, we implement common beam forming arithmetic using it. In a GTX285 GPU, computing speed is 170-450 times faster than AD TigerSharc201. This shows good prospects of GPU.

CUDA

Feb, 12

A Networked Dataflow Simulation Environment for Signal Processing and Data Mining Applications

In networked signal processing systems, dataflow graphs can be used to describe the processing on individual network nodes. However, to analyze the correctness and performance of these systems, designers must understand the interactions across these individual "node-level" dataflow graphs — as they communicate across the network – in addition to the characteristics of the individual […]

CUDA

Feb, 12

A volume segmentation approach based on GrabCut

The representation of an image as a flow network has gained an increased interest in research for the 2D and 3D segmentation field. One of these segmentation approaches consists in applying a minimum cut algorithm to separate the image in background and foreground. The most remarkable algorithm to segment a 2D image using this approach […]

CUDA

Feb, 12

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

We design and implement a parallel algebraic multigrid method for isotropic graph Laplacian problems on multicore Graphical Processing Units (GPUs). The proposed AMG method is based on the aggregation framework. The setup phase of the algorithm uses a parallel maximal independent set algorithm in forming aggregates and the resulting coarse level hierarchy is then used […]

CUDA

Feb, 12

Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show how dynamic programming can speedup the process by orders of magnitude, even when max-pooling layers are […]

CUDA

Feb, 12

Seismic Attributes Extraction Based on GPU

In oil and gas exploration, the seismic data can provide the information of the earth’s subsurface structure and detect where oil can be found and recovered. To get a geological model of the earth, the complex iterative processing is being done. So, the need for computing power increases with the oil and gas exploration and […]

CUDA

Feb, 12

Implementing an architecture for efficient network traffic processing on modern graphics hardware

Network traffic processing is necessary in order to develop active components in the infrastructure of the network, such as routers, or passive applications, such as network intrusion detection systems. However, in today’s high-speed network links this has become a very challenging task in terms of computational resources. Custom hardware appliances that can handle high packet […]

CUDA

Feb, 12

Accelerated Wide Baseline Matching using OpenCL

Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Robust feature descriptors can give vast improvements in the quality and speed of subsequent steps, but intensive computation is still required. With the release of general purpose parallel computing interfaces, opportunities for increases in performance arise. […]

OpenCL

Feb, 12

Extending the Computational Application of Reaction-Diffusion Chemistry by Modelling Artificial Neural Networks

There is a huge computational potential in unconventional computing paradigms such as reaction-diffusion chemistry. The main problem with unconventional systems is the inherent difficulty in programming them. By extending the computational application of reaction-diffusion systems, this problem may be alleviated, as every new application allows for another method of approaching problems. With the central nervous […]

CUDA

Feb, 9

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Compute-intensive data-analytic (CIDA) applications have become a major component of many different business domains, as well as scientific computing applications. These algorithms stem from domains as diverse as web analysis and social networks, machine learning and data mining, text analysis, bio-informatics, astronomy image analysis, business analytics, large scale graph algorithms, image/video processing and recognition, some […]

CUDA

Feb, 9

Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis

3D ECT provides a lot of challenging computational issues as image reconstruction requires execution of many basic operations of linear algebra, especially when the solutions are based on Finite Element Method. In order to reach real-time reconstruction a 3D ECT computational subsystem has to be able to transform capacitance data into image in fractions of […]

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

Exploiting Multiple Levels of Parallelism and Online Refinement of Unstructured Meshes in Atmospheric Model Application

The Implement of Common Beam Forming Using GPU

A Networked Dataflow Simulation Environment for Signal Processing and Data Mining Applications

A volume segmentation approach based on GrabCut

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

Seismic Attributes Extraction Based on GPU

Implementing an architecture for efficient network traffic processing on modern graphics hardware

Accelerated Wide Baseline Matching using OpenCL

Extending the Computational Application of Reaction-Diffusion Chemistry by Modelling Artificial Neural Networks

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)