high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Concurrent GPU Programming

Concurrent GPU Programming

Lesley Northam

School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1

University of Waterloo, 2009

BibTeX

Download (PDF)

View

Source

1941

views

Monte Carlo algorithms use repeated random sampling to find solutions to problems. One common example uses points randomly selected from the unit box to approximate the value of pi. Another example is a simulation called a virtual spectrophotometer which measures the reflectance of a modeled material [1]. The repetitive nature of Monte Carlo algorithms usually causes these programs to be time and energy intensive. These repetitions are identical and mutually independent, leading to easy parallelization. Repetition level granularity may require an exorbitant number of threads. Multi-core architectures are capable of parallel thread execution, but the massively multi-threaded GPU architecture is better suited to Monte Carlo and virtual spectrophotometer work-loads because they offer orders of magnitude more thread-level parallelism. The purpose of this project is to explore general purpose GPU programming and implement a virtual spectrophotometer that uses the massively multi-threaded nature of the GPU. Particular attention is given to the concurrency issues that limit the performance of the spectrophotometer on the GPU. This report is divided into four sections. The first section provides background information on general purpose GPU programming. The second section details the CUDA hardware abstraction layer. The third section discusses the limitations of CUDA applications and provides potential solutions to these limitations. The final section analyzes several CUDA implementations of a virtual spectrophotometer and compares their execution time with a single-thread CPU implementation.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 8800 M GTX, Programming techniques, Review

February 26, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Concurrent GPU Programming

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Concurrent GPU Programming

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)