high performance computing on graphics processing units: hgpu.org

Posts

Oct, 27

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

In this thesis, a MPI parallelized LBM code for a Multi-GPU platform has been designed and implemented. The primary goal of the thesis is research on efficient and scalable Multi-GPU LBM code, which exploits advanced features of the modern GPUs, to adopt optimization techniques like overlapping of work and communication in heterogeneous CPU-GPU clusters. In […]

OpenCL

Oct, 27

Airborne Downward Looking Sparse Linear Array 3-D SAR Heterogeneous Parallel Simulation

The airborne downward looking sparse linear array three dimensional synthetic aperture radar (DLSLA 3-D SAR) operates nadir observation with the along-track synthetic aperture formulated by platform movement and the cross-track synthetic aperture formulated by physical sparse linear array. Considering the lack of DLSLA 3-D SAR data in the current preliminary study stage, it is very […]

CUDA

Oct, 27

Real-Time Stereo Matching using Adaptive Window based Disparity Refinement

In this paper, we propose a real-time stereo matching method based on adaptive window, aiming at the trade-off between accuracy and efficiency in current local stereo matching, Considering that the Census transform has good adaptability to image amplitude distortion, but may introduce matching ambiguities in regions with noise or similar local structures, we combine the […]

CUDA

Oct, 26

International Conference on Computational Science, ICCS 2014

The International Conference on Computational Science is an annual conference that brings together researchers and scientists from mathematics and computer science as basic computing disciplines, researchers from various application areas who are pioneering computational methods in sciences such as physics, chemistry, life sciences, and engineering, as well as in arts and humanitarian fields, to discuss […]

Oct, 26

22nd ACME Conference on Computational Mechanics

The purpose of this conference is to share state-of-the-art research findings and experience across the full range of Computational Mechanics. The conference organising committee is particularly keen to encourage the participation of young researchers, including PhD students and research assistants. The Conference will emphasize on recent developments in the field of Computational Mechanics through a […]

Oct, 26

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

In this work we describe a GPU implementation of a first order two-layer Savage-Hutter type model introduced by E. D. Fernandez-Nieto et al in 2008 to simulate tsunamis generated by underwater landslides using the CUDA framework over structured meshes. We also describe an extension of this implementation which exploits the parallel power of a GPU […]

CUDA

Oct, 26

A New Approach of Performance Analysis of Certain Graph Algorithms

Computer Network based problems often require searching a node from another and finding a path from one node to another. To solve this we use graph algorithms. Solving these problems takes a lot of time and knowledge when solved manually. For this purpose graph algorithms where devised and solving these problems became easier but the […]

CUDA

Oct, 26

A Parallel Depth-aided Exemplar-based Inpainting for Real-time View Synthesis on GPU

Synthesizing new images from given image pair and their corresponding depth maps is an essential function for many 3D video applications. Exemplar-based inpainting methods have been proposed in recent years to be used to restore newly synthesized images by strategically filling the missing pixels which don’t have any references due to occlusion. Due to the […]

CUDA

Oct, 25

A Datalog Engine for GPUs

We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a multicore) and memory in […]

CUDA

Oct, 25

Online Performance Projection for Clusters with Heterogeneous GPUs

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which […]

OpenCL

Oct, 25

An Empirical Study of Intel Xeon Phi

With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used both as a coprocessor […]

Oct, 25

GGAS: Global GPU Address Spaces for Efficient Communication in Heterogeneous Clusters

Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics applications, but are also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, GPUs are distributed throughout the cluster to process parallel programs. In fact, many recent high-performance systems in the TOP500 list are heterogeneous architectures. Despite being highly effective […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

Airborne Downward Looking Sparse Linear Array 3-D SAR Heterogeneous Parallel Simulation

Real-Time Stereo Matching using Adaptive Window based Disparity Refinement

International Conference on Computational Science, ICCS 2014

22nd ACME Conference on Computational Mechanics

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

A New Approach of Performance Analysis of Certain Graph Algorithms

A Parallel Depth-aided Exemplar-based Inpainting for Real-time View Synthesis on GPU

A Datalog Engine for GPUs

Online Performance Projection for Clusters with Heterogeneous GPUs

An Empirical Study of Intel Xeon Phi

GGAS: Global GPU Address Spaces for Efficient Communication in Heterogeneous Clusters

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)