high performance computing on graphics processing units: hgpu.org

Posts

Dec, 22

Legion: Programming Distributed Heterogeneous Architectures with Logical Regions

This thesis covers the design and implementation of Legion, a new programming model and runtime system for targeting distributed heterogeneous machine architectures. Legion introduces logical regions as a new abstraction for describing the structure and usage of program data. We describe how logical regions provide a mechanism for applications to express important properties of program […]

CUDA

Dec, 22

A Domain-specific Language to Facilitate Software Defined Radio Parallel Executable Patterns Deployment on Heterogeneous Architectures

In this paper, we present a domain-specific language, referred to as OptiSDR, that matches high level digital signal processing (DSP) routines for software defined radio (SDR) to their generic parallel executable patterns targeted to heterogeneous computing architectures (HCAs). These HCAs includes a combination of hybrid GPU-CPU and DSP-FPGA architectures that are programmed using different programming […]

CUDA

•

OpenCL

Dec, 22

Purine: A bi-graph based deep learning framework

In this paper, we introduce a novel deep learning framework, termed Purine. In Purine, a deep network is expressed as a bipartite graph (bi-graph), which is composed of interconnected operators and data tensors. With the bi-graph abstraction, networks are easily solvable with event-driven task dispatcher. We then demonstrate that different parallelism schemes over GPUs and/or […]

CUDA

Dec, 22

Manycore processing of repeated k-NN queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. In this paper we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of k nearest neighbours (k-NN) queries over massive sets of moving objects, where the spatial extents of queries […]

CUDA

Dec, 22

PyFAI: a Python library for high performance azimuthal integration on GPU

The pyFAI package has been designed to reduce X-ray diffraction images into powder diffraction curves to be further processed by scientists. This contribution describes how to convert an image into a radial profile using the Numpy package, how the process was accelerated using Cython. The algorithm was parallelised, needing a complete re-design to benefit from […]

OpenCL

Dec, 22

GPU Pro 5: Advanced Rendering Techniques

In GPU Pro5: Advanced Rendering Techniques, section editors Wolfgang Engel, Christopher Oat, Carsten Dachsbacher, Michal Valient, Wessam Bahnassi, and Marius Bjorge have once again assembled a high-quality collection of cutting-edge techniques for advanced graphics processing unit (GPU) programming. Divided into six sections, the book covers rendering, lighting, effects in image space, mobile devices, 3D engine […]

OpenCL

•

OpenGL

Dec, 22

Accelerating Ab Initio Nuclear Physics Calculations with GPUs

This paper describes some applications of GPU acceleration in ab initio nuclear structure calculations. Specifically, we discuss GPU acceleration of the software package MFDn, a parallel nuclear structure eigensolver. We modify the matrix construction stage to run partly on the GPU. On the Titan supercomputer at the Oak Ridge Leadership Computing Facility, this produces a […]

CUDA

Dec, 22

GPGPU-Sim

This thesis studies the impact of hardware features of graphics cards on performance of GPU computing using GPGPU-Sim simulation software tool. GPU computing is a growing topic in the world of computing, and could be an important milestone for computers. Therefore, such a study that seeks to identify the performance bottlenecks of the program with […]

CUDA

Dec, 22

GPU Accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian

Pedestrian movement, although ubiquitous and well-studied, is still not that well under-stood due to the complicating nature of the embedded social dynamics. Interest among researchers in simulating the nature of pedestrian movement and interactions has grown significantly in part due to increased computational and visualization capabilities afforded by high power computing. Different approaches have been […]

CUDA

Dec, 22

Fast Solving of Influence Diagrams for Multiagent Planning on GPU-enabled Architectures

Planning under uncertainty in multiagent settings is highly intractable because of history and plan space complexities. Probabilistic graphical models exploit the structure of the problem domain to mitigate the computational burden. In this paper, we introduce the first parallelization of planning in multiagent settings on a CPU-GPU heterogeneous system. In particular, we focus on the […]

CUDA

Dec, 20

Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming

Recently, heterogeneous system architectures are becoming mainstream for achieving high performance and power efficiency. In particular, many-core graphics processing units (GPUs) now play an important role for computing in heterogeneous architectures. However, for application designers, computational workload still needs to be distributed to heterogeneous GPUs manually and remains inefficient. In this paper, we propose a […]

CUDA

Dec, 20

A Review on Parallelization of Node based Game Tree Search Algorithms on GPU

Game tree search is a classical problem in the field of game theory and artificial intelligence. Focus of the system is on how to leverage massive parallelism capabilities of GPUs to accelerate the speed of game tree algorithms and propose a concise and general parallel game tree algorithm on GPUs. Comparison can be done for […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Legion: Programming Distributed Heterogeneous Architectures with Logical Regions

A Domain-specific Language to Facilitate Software Defined Radio Parallel Executable Patterns Deployment on Heterogeneous Architectures

Purine: A bi-graph based deep learning framework

Manycore processing of repeated k-NN queries over massive moving objects observations

PyFAI: a Python library for high performance azimuthal integration on GPU

GPU Pro 5: Advanced Rendering Techniques

Accelerating Ab Initio Nuclear Physics Calculations with GPUs

GPGPU-Sim

GPU Accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian

Fast Solving of Influence Diagrams for Multiagent Planning on GPU-enabled Architectures

Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming

A Review on Parallelization of Node based Game Tree Search Algorithms on GPU

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)