high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An OpenCL framework for heterogeneous multicores with local memory

An OpenCL framework for heterogeneous multicores with local memory

Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jungho Park, Honggyu Kim, Thanh T. Dao, Yongjin Cho, Sung J. Seo, Seung H. Lee, Seung M. Cho, Hyo J. Song, Sang B. Suh, Jong D. Choi

Seoul National University, Seoul, South Korea

In PACT ’10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques (2010), pp. 193-204

DOI:10.1145/1854273.1854301

BibTeX

Source

2162

views

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our OpenCL runtime is based on software-managed caches and coherence protocols that guarantee OpenCL memory consistency to overcome the limited size of the local memory. To boost performance, the runtime relies on three source-code transformation techniques, work-item coalescing, web-based variable expansion and preload-poststore buffering, performed by our OpenCL C source-to-source translator. Work-item coalescing is a procedure to serialize multiple SPMD-like tasks that execute concurrently in the presence of barriers and to sequentially run them on a single accelerator core. It requires the web-based variable expansion technique to allocate local memory for private variables. Preload-poststore buffering is a buffering technique that eliminates the overhead of software cache accesses. Together with work-item coalescing, it has a synergistic effect on boosting performance. We show the effectiveness of our OpenCL framework, evaluating its performance with a system that consists of two Cell BE processors. The experimental result shows that our approach is promising.

Tags: Computer science, Memory model, OpenCL

January 16, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

An OpenCL framework for heterogeneous multicores with local memory

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

An OpenCL framework for heterogeneous multicores with local memory

Share this:

Recent source codes

Most viewed papers (last 30 days)