high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

Martin Burtscher, Hassan Rabeti

Department of Computer Science, Texas State University-San Marcos, San Marcos, TX 78666, USA

27th IEEE International Parallel & Distributed Processing Symposium, 2013

BibTeX

Download (PDF)

View

Source

1975

views

This paper describes and evaluates a highly-scalable framework for running iterative local searches on heterogeneous HPC platforms. The user only needs to provide serial CPU or single-GPU code that implements a simple interface. The framework then executes this code in parallel using MPI between compute nodes and OpenMP and multi-GPU support within nodes. It handles all parallelization aspects, seed distribution and program termination, and it regularly records the currently best solution. We evaluate our framework on three supercomputers using a heuristic iterative hill-climbing TSP solver as well as a search for good finite-state machines. The framework scales to 2048 nodes (32,768 cores) on Ranger with less than a 5% drop in efficiency, searches over 12.2 trillion TSP tours per second on Stampede using 1024 nodes, and evaluates over 21.5 trillion FSM transitions per second using 256 CPUs and 384 GPUs on Keeneland.

Tags: Computer science, CUDA, Heterogeneous systems, MPI, nVidia, Search, Tesla M2090

March 12, 2013 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

Share this:

Recent source codes

Most viewed papers (last 30 days)