high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A compiler for high performance computing with many-core accelerators

A compiler for high performance computing with many-core accelerators

Naohito Nakasato, Jun Makino

Department of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu, Japan

IEEE International Conference on Cluster Computing and Workshops, 2009. CLUSTER ’09

DOI:10.1109/CLUSTR.2009.5289127

@inproceedings{nakasato2009compiler,

title={A compiler for high performance computing with many-core accelerators},

author={Nakasato, N. and Makino, J.},

booktitle={Cluster Computing and Workshops, 2009. CLUSTER’09. IEEE International Conference on},

pages={1–9},

year={2009},

organization={IEEE}

}

Download (PDF)

View

Source

2714

views

We introduce a newly developed compiler for high performance computing using many-core accelerators. A high peak performance of such accelerators attracts researchers who are always demanding faster computers. However, it is difficult to create an efficient implementation of an existing serial program for such accelerators even in the case of massively parallel problems. While existing parallel programming tools force us to program every details of an implementation from loop-level parallelism to 4-vector SIMD operations, our novel approach is that given a compute intensive problem expressed as a nested loop, the compiler only ask us to define a compute kernel inside the inner-most loop. We observe that input variables appeared in the kernel is classified into two types; invariant during the loop and variables updated in each iteration. The compiler let us to specify either type of the inputs so as it pick a predefined optimal way to process them. The compiler successfully generates the fastest code ever for many-particle simulations with the performance of 500 GFLOPS (single precision) on RV770 GPU. Another successful application is the evaluation of a multidimensional integral. It runs at a speed of 5 – 7 GFLOPS (quadruple precision) on both GRAPE-DR and GPU.

Tags: ATI, ATI CAL, ATI IL, Compilers, Computer science, RV770

June 15, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A compiler for high performance computing with many-core accelerators

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

A compiler for high performance computing with many-core accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)