high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Algorithmic performance studies on graphics processing units

Algorithmic performance studies on graphics processing units

O. Schenk, M. Christen, H. Burkhart

Department of Computer Science, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland

Journal of Parallel and Distributed Computing, Vol. 68, No. 10. (October 2008), pp. 1360-1369.

DOI:10.1016/j.jpdc.2008.05.008

@article{schenk2008algorithmic,

title={Algorithmic performance studies on graphics processing units},

author={Schenk, O. and Christen, M. and Burkhart, H.},

journal={Journal of Parallel and Distributed Computing},

volume={68},

number={10},

pages={1360–1369},

issn={0743-7315},

year={2008},

publisher={Elsevier}

}

Download (PDF)

View

Source

1572

views

We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify the matrix-matrix multiplication as a first natural entry-point for a minimally invasive integration of GPUs. We investigate the performance on the NVIDIA GeForce 8800 multicore chip initially architectured for intensive gaming applications. We exploit the architectural features of the GeForce 8800 GPU to design an efficient GPU-parallel sparse matrix solver. A prototype approach to leverage the bandwidth and computing power of GPUs for these matrix kernel operation is demonstrated resulting in an overall performance of over 110 GFlops/s on the desktop for large matrices and over 38 GFlops/s for sparse matrices arising in real applications. We use our GPU algorithm for PDE-constrained optimization problems and demonstrate that the commodity GPU is a useful co-processor for scientific applications.

Tags: Computer science, Linear Algebra, Matrix decomposition, Nonlinear optimization, nVidia, nVidia GeForce 8800 GTX, Sparse direct solvers

November 2, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Algorithmic performance studies on graphics processing units

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Algorithmic performance studies on graphics processing units

Share this:

Recent source codes

Most viewed papers (last 30 days)