high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Matching and Clustering Algorithms on GPUs

Parallel Matching and Clustering Algorithms on GPUs

Md. Naim

University of Bergen

University of Bergen, 2017

BibTeX

Download (PDF)

View

Source

2693

views

The main focus of this thesis is on developing efficient algorithms on GPUs for certain matching and clustering problems. Through extensive experiments we show that sparse and unstructured problems can benefit greatly from using GPUs as long as the algorithms are carefully designed. Even though none of the presented algorithms are fundamentally new, they still require significant redesign to make them efficient on GPUs. Common to all the developed algorithms is that they emphasis achieving an even load balance and high degree of parallelism, while at the same time avoiding the use of time consuming synchronization operations. Through extensive experiments we verify the performance of the suggested algorithms. In some cases, even a single GPU can outperform tens of multiprocessors on a state-of-the-art supercomputer. The area of computing using GPU enhanced systems is changing rapidly. This is especially true for memory management. For instance, simultaneous access to the same memory by a host and a device was not possible until it was recently introduced in the Pascal GPU [15]. However, starting from the early unified shader architecture, the fundamental architectural features of GPUs have not undergone radical changes. Instead what has happened is that features have been added in an incremental fashion while the speed and size of the device has increased gradually. Based on this observation we believe that algorithms that are developed for GPUs today will also be relevant in the near future. Although the presented algorithms in this thesis can be viewed as proof of concept that carefully designed GPU algorithms for certain graph problems can compete with implementations on traditional parallel supercomputers, it is not to be underestimated that doing so requires substantial effort. For this to become possible on a more regular basis there is a need to continue and further develop higher level abstractions for graph analytics on GPUs. As an example, we believe that the strategy used in Paper III where vertices were grouped according to their degree before being allocated to different thread blocks, is one such technique that could benefit other applications. Also, the use of various reduction operations could be candidates for generic implementation. It seems clear that GPUs will play a major role in the future of HPC, with large systems containing several interconnected ones. Designing efficient graph algorithms for such systems is still only starting and likely to generate much interesting work in the future. Just like the distinction between traditional shared and distributed algorithms, we believe that one will see a similar division between different GPU algorithms depending on how the underlying devices are interconnected.

Tags: Algorithms, Clustering, Computer science, CUDA, nVidia, Tesla K40, Thesis

October 21, 2017 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Matching and Clustering Algorithms on GPUs

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Parallel Matching and Clustering Algorithms on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)