high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Subgraph Mining on Hybrid Platforms: HPC Systems, Multi-Cores and GPUs

Parallel Subgraph Mining on Hybrid Platforms: HPC Systems, Multi-Cores and GPUs

Nilothpal Talukder

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute, Troy, New York, 2016

BibTeX

Download (PDF)

View

Source

1889

views

Frequent subgraph mining (FSM) is an important problem in numerous application areas, such as computational chemistry, bioinformatics, social networks, computer programming languages, etc. However, the problem is computationally hard because it requires enumerating possibly an exponential number of candidate subgraph patterns, and checking their presence in a single large graph or a database of graphs. In recent years, it has become more challenging due to the rapidly increasing graph sizes. For instance, the number of users on online social networks like Facebook are now in billions. In this thesis, we present novel algorithms to scale frequent subgraph mining on a variety of computing platforms/architectures. These include High Performance Computing systems (HPC), such as IBM Blue Gene/Q, multi-core CPUs, and Graphics Processing Units (GPUs). First, we present frequent subgraph mining from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory of any individual compute node. The input graph thus has to be partitioned among the nodes, which can lead to potential false negatives. Furthermore, for scalable performance it is crucial to minimize the communication among the compute nodes. Our algorithm, DistGraph, ensures that there are no false negatives, and uses a set of optimizations and efficient collective communication operations to minimize information exchange. To our knowledge DistGraph is the first approach demonstrated to scale to graphs with over a billion vertices and edges. Scalability results on up to 2048 IBM Blue Gene/Q compute nodes, with 16 cores each, show very good speedup. The results also show orders of magnitude speedup compared with the state-of-art sequential solutions. Then, we describe a hybrid parallel approach ParGraph to mine a single moderate sized graph that fits inside the memory of a single compute node. However, this approach uses a hybrid load balancing scheme to efficiently distribute workload on both MPI and thread levels. Moreover, unlike the distributed approach it does not require synchronization among processes at each expansion level of the growing patterns, which makes it very efficient. Finally, we describe parallel frequent subgraph mining on a database of multiple labeled graphs, using Graphics Processing Units (GPUs). In recent times, GPUs have emerged as a relatively cheap but powerful architecture for general purpose computing. However, the thread-model for GPUs is different from that of CPUs, which makes the parallelization of graph mining algorithms on GPUs a challenging task. We investigate the major challenges for GPU-based graph mining, and perform extensive experiments on several real-world and synthetic datasets, achieving speedups up to 9 over a sequential algorithm running on a single-core CPU.

Tags: Algorithms, Computer science, CUDA, Data mining, Databases, MPI, nVidia, Tesla C2075, Thesis

April 29, 2016 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Subgraph Mining on Hybrid Platforms: HPC Systems, Multi-Cores and GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Parallel Subgraph Mining on Hybrid Platforms: HPC Systems, Multi-Cores and GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)