high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallel Programming using OpenCL on Modern Architectures

Parallel Programming using OpenCL on Modern Architectures

Allan Svejstrup Nielsen, Allan Peter Engsig-Karup, Bernd Dammann

Technical University of Denmark

Technical University of Denmark, IMM Technical Report 2012-05, 2012

BibTeX

Download (PDF)

View

Source

2391

views

This report is intended as a quick introduction to the OpenCL framework and the aim is to facilitate a smooth transfer into the use OpenCL C for developers with previous GPGPU experience. The purpose of OpenCL is to allow for developers to use all compute resources available on a heterogeneous hardware platform. As well as being an introduction to OpenCL, the report also presents an overview of AMD GPU hardware, covering both the VLIW5/4 architectures and the upcoming Graphics-Core-Next architecture which is to form the basis of AMDs future generation GPUs that are to be as capable at compute as they are at graphics. To conclude the presentation of OpenCL as a language for compute, a matrix-matrix multiplication example is devised and optimized for the VLIW4, Tesla and Fermi architectures. The performance is measured as a function of both matrix and work-group size and results are discussed. Where applicable, the equivalent CUDA implementation is tested for comparison.

Tags: ATI, ATI Radeon HD 6990, Computer science, Heterogeneous systems, Matrix multiplication, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 590, OpenCL, Overview, Tutorial

October 15, 2012 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Programming using OpenCL on Modern Architectures

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Parallel Programming using OpenCL on Modern Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)