high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Simple optimizations for an applicative array language for graphics processors

Simple optimizations for an applicative array language for graphics processors

Bradford Larsen

Department of Computer Science, Tufts University

Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP ’11, 2011

DOI:10.1145/1926354.1926360

BibTeX

Download (PDF)

View

Source

1790

views

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.

Tags: Code generation, Compilers, Computer science, CUDA, nVidia, nVidia GeForce 8800 GT, Optimization, Programming Languages, Programming techniques

September 23, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Simple optimizations for an applicative array language for graphics processors

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Simple optimizations for an applicative array language for graphics processors

Share this:

Recent source codes

Most viewed papers (last 30 days)