high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallel Distributed Breadth First Search on the Kepler Architecture

Parallel Distributed Breadth First Search on the Kepler Architecture

Mauro Bisson, Massimo Bernaschi, Enrico Mastrostefano

Istituto per le Applicazioni del Calcolo, IAC-CNR, Rome, Italy

arXiv:1408.1605 [cs.DC], (7 Aug 2014)

View

Source

1687

views

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the GPUs. The final result is a code that can visit 400 billion edges in a second by using a cluster equipped with 4096 Tesla K20X GPUs.

Tags: Computer science, CUDA, Distributed computing, Graph theory, nVidia, Tesla K20

August 9, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Efficient GPU Implementation of Multi-Precision Integer Division

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing

ParEval: A Parallel Code Evaluation Benchmark

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

WiLLM: An Open Wireless LLM Communication System

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org