high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient computation of sum-products on GPUs through software-managed cache

Efficient computation of sum-products on GPUs through software-managed cache

Mark Silberstein,Assaf Schuster,Dan Geiger,Anjul Patney,John D. Owens

Technion – Israel Institute of Technology, Haifa, Israel

In ICS ’08: Proceedings of the 22nd annual international conference on Supercomputing (2008), pp. 309-318.

DOI:10.1145/1375527.1375572

@conference{silberstein2008efficient,

title={Efficient computation of sum-products on GPUs through software-managed cache},

author={Silberstein, M. and Schuster, A. and Geiger, D. and Patney, A. and Owens, J.D.},

booktitle={Proceedings of the 22nd annual international conference on Supercomputing},

pages={309–318},

year={2008},

organization={ACM}

}

Download (PDF)

View

Source

2456

views

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms. We apply this technique to the implementation of the GPU-based solver of the sum-product or marginalize a product of functions (MPF) problem, which arises in a wide variety of real-life applications in artificial intelligence, statistics, image processing, and digital communications. Our motivation to accelerate MPF originated in the context of the analysis of genetic diseases, which in some cases requires years to complete on modern CPUs. Computing MPF is similar to computing the chain matrix product of multi-dimensional matrices, but is more difficult due to a complex data-dependent access pattern, high data reuse, and a low compute-to-memory access ratio. Our GPU-based MPF solver achieves up to 2700-fold speedup on random data and 270-fold on real-life genetic analysis datasets on GeForce 8800GTX GPU from NVIDIA over the optimized CPU version on an Intel 2.4GHz Core 2 with a 4MB L2 cache.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, Programming techniques

November 5, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient computation of sum-products on GPUs through software-managed cache

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Efficient computation of sum-products on GPUs through software-managed cache

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)