high performance computing on graphics processing units: hgpu.org

Posts

Jul, 9

Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers

In this work we propose a new scheduling approach designed from scratch to maximize heterogeneous computers usage and the event processing flow at the same time. The scheduler is built based on three fundamental concepts which introduces a new vision of discrete event simulation: 1) events are clustered according to their potential time parallelism on […]

CUDA

Jul, 9

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. […]

CUDA

Jul, 9

Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws

We present an implementation of the discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. High performance scientific computing suits GPUs well, […]

CUDA

Jul, 9

Computation of the Isogeometric Analysis Stiffness Matrix on GPU

Due to high regularity across mesh elements of isogeometric analysis, this new method achieves higher accuracy per degree of freedom and improved spectrum properties, among others, compared to finite element analysis. However, this inherent feature of isogeometric analysis reduces the sparsity pattern of stiffness matrix and requires more elaborate numerical integration schemes for its computation. […]

CUDA

Jul, 9

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the […]

OpenCL

Jul, 8

A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model

In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]

CUDA

Jul, 8

ParadisEO-MO-GPU: a Framework for Parallel GPU-based Local Search Metaheuristics

In this paper, we propose a pioneering framework called ParadisEO-MO-GPU for the reusable design and implementation of parallel local search metaheuristics (S-Metaheuristics) on Graphics Processing Units (GPU). We revisit the ParadisEO-MO software framework to allow its utilization on GPU accelerators focusing on the parallel iteration-level model, the major parallel model for S-Metaheuristics. It consists in […]

CUDA

Jul, 8

Coalition Structure Generation with the Graphic Processor Unit

Coalition Structure Generation-the problem of finding the optimal set of coalitions – has received considerable attention in recent AI literature. The fastest exact algorithm to solve this problem is IDP-IP*, due to Rahwan et al. (2012). This algorithm is a hybrid of two previous algorithms, namely IDP and IP. As such, it is desirable to […]

CUDA

Jul, 8

Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL

The use of GPUs for processing large sets of parallelizable data has increased sharply in recent years. As the concept of GPU computing is still relatively young, parameters other than computation time, such as energy efficiency, are being overlooked. Two parallel computing platforms, CUDA and OpenCL, provide developers with an interface that they can use […]

CUDA

•

OpenCL

Jul, 8

GPU Implementation of Real-Time Biologically Inspired Face Detection using CUDA

In this paper massively parallel real-time face detection based on a visual attention and cortex-like mechanism of cognitive vision system is presented. As a first step, we use saliency map model to select salient face regions and HMAX C1 model to extract features from salient input image and then apply mixture of expert neural network […]

CUDA

Jul, 7

Comparison of Rectangular Matrix Multiplication with and without Border Conditions

Matrix multiplication algorithms are very common and widely used for computation in almost any field. There are many implementations for matrix multiplication on different platforms and programming models. GPU devices in the recent years have become powerful computational units that have entered the segment of high performance computing. In this paper we are analysing two […]

CUDA

Jul, 7

Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices

Efficiently modelling seismic datasets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we present a 3D finite-difference time-domain (FDTD) solver using a 2nd-order temporal and 8th-order spatial accuracy stencil that leverages the massively parallel architecture of […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws

Computation of the Isogeometric Analysis Stiffness Matrix on GPU

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model

ParadisEO-MO-GPU: a Framework for Parallel GPU-based Local Search Metaheuristics

Coalition Structure Generation with the Graphic Processor Unit

Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL

GPU Implementation of Real-Time Biologically Inspired Face Detection using CUDA

Comparison of Rectangular Matrix Multiplication with and without Border Conditions

Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)