high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Aparna Chandramowlishwarany, Kamesh Madduri, Richard Vuduc

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA

In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010), pp. 1-12.

DOI:10.1109/SC.2010.19

@conference{chandramowlishwarany2010diagnosis,

title={Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method},

author={Chandramowlishwarany, A. and Madduri, K. and Vuduc, R.},

booktitle={Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis},

pages={1–12},

year={2010},

organization={IEEE Computer Society}

}

Download (PDF)

View

Source

2118

views

Given a program and a multisocket, multicore system, what is the process by which one understands and improves its performance and scalability? We describe an approach in the context of improving within-node scalability of the fast multipole method (FMM). Our process consists of a systematic sequence of modeling, analysis, and tuning steps, beginning with simple models, and gradually increasing their complexity in the quest for deeper performance understanding and better scalability. For the FMM, we significantly improve within-node scalability; for example, on a quad-socket Intel Nehalem-EX system, we show speedups of 1.7x over the previous best multithreaded implementation, 19.3x over a sequential but highly tuned (e.g., SIMD-vectorized) code, and match or outperform a state-of- the-art GPGPU implementation. Our study sheds new light on the form of a more general performance analysis and tuning process that other multicore/manycore tuning practitioners (end- user programmers) and automated performance analysis and tuning tools could themselves apply.

Tags: Computer science, Fast multipole method, Optimization, Performance

January 11, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)