high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Cellular GPU Models to Euclidean Optimization Problems

Cellular GPU Models to Euclidean Optimization Problems

Naiyu Zhang

IRTES – SET – Laboratoire Systemes et Transports

tel-00982405, (23 April 2014)

@phdthesis{zhang2014cellular,

title={Cellular GPU Models to Euclidean Optimization Problems},

author={Zhang, Naiyu},

year={2014}

}

Download (PDF)

View

Source

2044

views

The work presented in this PhD studies and proposes cellular computation parallel models able to address different types of NP-hard optimization problems defined in the Euclidean space, and their implementation on the Graphics Processing Unit (GPU) platform. The goal is to allow both dealing with large size problems and provide substantial acceleration factors by massive parallelism. The field of applications concerns vehicle embedded systems for stereovision as well as transportation problems in the plane, as vehicle routing problems. The main characteristic of the cellular model is that it decomposes the plane into an appropriate number of cellular units, each responsible of a constant part of the input data, and such that each cell corresponds to a single processing unit. Hence, the number of processing units and required memory are with linear increasing relationship to the optimization problem size, which makes the model able to deal with very large size problems.The effectiveness of the proposed cellular models has been tested on the GPU parallel platform on four applications. The first application is a stereo-matching problem. It concerns color stereovision. The problem input is a stereo image pair, and the output a disparity map that represents depths in the 3D scene. The goal is to implement and compare GPU/CPU winner-takes-all local dense stereo-matching methods dealing with CFA (color filter array) image pairs. The second application focuses on the possible GPU improvements able to reach near real-time stereo-matching computation. The third and fourth applications deal with a cellular GPU implementation of the self-organizing map neural network in the plane. The third application concerns structured mesh generation according to the disparity map to allow 3D surface compressed representation. Then, the fourth application is to address large size Euclidean traveling salesman problems (TSP) with up to 33708 cities.In all applications, GPU implementations allow substantial acceleration factors over CPU versions, as the problem size increases and for similar or higher quality results. The GPU speedup factor over CPU was of 20 times faster for the CFA image pairs, but GPU computation time is about 0.2s for a small image pair from Middlebury database. The near real-time stereovision algorithm takes about 0.017s for a small image pair, which is one of the fastest records in the Middlebury benchmark with moderate quality. The structured mesh generation is evaluated on Middlebury data set to gauge the GPU acceleration factor and quality obtained. The acceleration factor for the GPU parallel self-organizing map over the CPU version, on the largest TSP problem with 33708 cities, is of 30 times faster.

Tags: Algorithms, Computer science, CUDA, Neural networks, nVidia, nVidia GeForce GTX 570, Optimization, Self-organizing map, Thesis

April 27, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Cellular GPU Models to Euclidean Optimization Problems

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Cellular GPU Models to Euclidean Optimization Problems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)