high performance computing on graphics processing units: hgpu.org

Posts

Nov, 20

Dataflow-driven GPU performance projection for multi-kernel transformations

Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one […]

CUDA

Nov, 20

Accelerating MapReduce on a coupled CPU-GPU architecture

The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, […]

OpenCL

Nov, 20

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to […]

CUDA

Nov, 20

MPC Toolbox with GPU Accelerated Optimization Algorithms

The introduction of Graphical Processing Units (GPUs) in scientific computing has shown great promise in many different fields. While GPUs are capable of very high floating point performance and memory bandwidth, its massively parallel architecture requires algorithms to be reimplemented to suit the different architecture. Interior point method can be used to solve convex optimization […]

CUDA

Nov, 20

Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs

The topic of this thesis is GPU accelerated sparse linear algebra for subsurface reservoir modeling. Numerical techniques for reservoir sim- ulations are described and we present the open source reservoir simulation software toolbox MRST. We discuss some of the challenges related to linear algebra and reservoir simulation. Furthermore, we discuss the possibility GPU-acceleraing the linear […]

CUDA

Nov, 19

CUDA-enabled Optimisation of Technical Analysis Parameters

The optimisation of Technical Trading parameters is a computationally intensive exercise. Models comprising a modest number of Technical Indicators require many thousands of simulations to be executed over a sample period of data, with the best performing sets of parameters employed to generate future trading signals. The purpose of this research is to investigate the […]

CUDA

Nov, 19

Modern GPGPU Frameworks and their Application to the Physical Core of the ASUCA Weather Prediction Model

One of today’s biggest challenges in the field of high performance computing is the efficient exploitation of the heavily increasing parallelism on socket level, especially when both CPU and GPU resources are to be applied – a challenge becoming very real for the physical processes of ASUCA. ASUCA is the Japan Meteorological Agency’s next-generation weather […]

CUDA

Nov, 19

Parallel Search of k-Nearest Neighbors with Synchronous Operations

We present a new study of parallel algorithms for locating k-nearest neighbors (kNN) of each single query in a high dimensional (feature) space on a many-core processor or accelerator that favors synchronous operations, such as on a graphics processing unit. Exploiting the intimate relationships between two primitive operations, select and sort, we introduce a cohort […]

CUDA

Nov, 19

Criticality of the XY model in complex topologies

The critical behavior of the O(2) model on dilute Levy graphs built on a 2D square lattice is analyzed. Different qualitative cases are probed, varying the exponent rho governing the dependence on the distance of the connectivity probability distribution. The mean-field regime, as well as the long-range and short-range non-mean-field regimes are investigated by means […]

CUDA

Nov, 19

Accelerated molecular dynamics force evaluation on graphics processing units for thermal conductivity calculations

In this paper, we develop a highly efficient molecular dynamics code fully implemented on graphics processing units for thermal conductivity calculations using the Green-Kubo formula. We compare two different schemes for force evaluation, a previously used thread-scheme where a single thread is used for one particle and each thread calculates the total force for the […]

CUDA

Nov, 18

Auto-tunable GPU BLAS (thesis)

In this paper, we present our implementation of an Auto tuning system, written in C++, which incorporate the use of OpenCL kernels. We deploy this approach on different GPU architectures, evaluating the performance of the approach. Our main focus is to easily generate tuned code, that would otherwise require a large amount of empirical testing, […]

OpenCL

Nov, 18

Facial Recognition Using Neural Networks over GPGPU

This article introduces a parallel neural network approach implemented over Graphic Processing Units (GPU) to solve a facial recognition problem, which consists in deciding where the face of a person in a certain image is pointing. The proposed method uses the parallel capabilities of GPU in order to train and evaluate a neural network used […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Dataflow-driven GPU performance projection for multi-kernel transformations

Accelerating MapReduce on a coupled CPU-GPU architecture

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

MPC Toolbox with GPU Accelerated Optimization Algorithms

Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs

CUDA-enabled Optimisation of Technical Analysis Parameters

Modern GPGPU Frameworks and their Application to the Physical Core of the ASUCA Weather Prediction Model

Parallel Search of k-Nearest Neighbors with Synchronous Operations

Criticality of the XY model in complex topologies

Accelerated molecular dynamics force evaluation on graphics processing units for thermal conductivity calculations

Auto-tunable GPU BLAS (thesis)

Facial Recognition Using Neural Networks over GPGPU

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)