high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High Performance GPU Accelerated Local Optimization in TSP

High Performance GPU Accelerated Local Optimization in TSP

Kamil Rocki and Reiji Suda

The University of Tokyo

Third Workshop on Parallel Computing and Optimization (PCO’13) in conjunction with 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 20-24, 2013, Boston, USA (to appear)

BibTeX

Download (PDF)

View

Source

Package:

LOGO TSP Solver v. 0.5

1933

views

This paper presents a high performance GPU accelerated implementation of 2-opt local search algorithm for the Traveling Salesman Problem (TSP). GPU usage significantly decreases the execution time needed for tour optimization, however it also requires a complicated and well tuned implementation. With the problem size growing, the time spent on local optimization comparing the graph edges grows significantly. According to our results based on the instances from the TSPLIB library, the time needed to perform a simple local search operation can be decreased approximately 5 to 45 times compared to a corresponding parallel CPU code implementation using 6 cores. The code has been implemented in OpenCL and as well as in CUDA and tested on AMD and NVIDIA devices. The experimental studies show that the optimization algorithm using the GPU local search converges from up to 300 times faster compared to the sequential CPU version on average, depending on the problem size. The main contributions of this paper are the problem division scheme exploiting data locality which allows to solve arbitrarily big problem instances using GPU and the parallel implementation of the algorithm itself.

Tags: Package

March 12, 2013 by krocki

Rating: 2.5/5. From 2 votes.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

High Performance GPU Accelerated Local Optimization in TSP

Package:

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)

High Performance GPU Accelerated Local Optimization in TSP

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)