high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiler-Level Explicit Cache for a GPGPU Programming Framework

Compiler-Level Explicit Cache for a GPGPU Programming Framework

Tomoharu Kamiya, Takanori Maruyama, Kazuhiko Ohno, Masaki Matsumoto

Department of Information Engineering, Mie University, Tsu, Mie, Japan

The 2014 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’14), 2014

BibTeX

Download (PDF)

View

Source

1932

views

GPU is widely used for high-performance computing. However, standard programming framework such as CUDA and OpenCL requires low-level specifications, thus programming is difficult and the performance is not portable. Therefore, we are developing a new framework named MESI-CUDA. Providing virtual shared variables accessible from both CPU and GPU, MESI-CUDA hides complex memory architecture and eliminates low-level API function calls. However, the performance of current implementation is not sufficient because of the large memory access latency. Therefore, we propose a code-optimization scheme that utilizes fast on-chip shared memories as a compiler-level explicit cache of the off-chip device memory. The compiler estimates access count/range of arrays using static analysis. For mostly reused variables, code is modified to make copy on the shared memory and access the copy, using small shared memories efficiently. As the result of evaluation, our scheme achieved 13%-192% speedup in two of three programs.

Tags: Compilers, Computer science, CUDA, nVidia, nVidia GeForce GTX 680, nVidia GeForce GTX Titan, Tesla C2050, Tesla K20

December 12, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Compiler-Level Explicit Cache for a GPGPU Programming Framework

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Compiler-Level Explicit Cache for a GPGPU Programming Framework

Share this:

Recent source codes

Most viewed papers (last 30 days)