high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Graph Algorithms on the Xeon Phi Coprocessor

Parallel Graph Algorithms on the Xeon Phi Coprocessor

Dennis Felsing

Department of Informatics, Institute of Theoretical Computer Science, Parallel Computing Group, Karlsruhe Institute of Technology

Karlsruhe Institute of Technology, 2015

@article{felsing2015parallel,

title={Parallel Graph Algorithms on the Xeon Phi Coprocessor},

author={Felsing, Dennis},

year={2015}

}

Download (PDF)

View

Source

2229

views

Complex networks have received interest in a wide area of applications, ranging from road networks over hyperlink connections in the world wide web to interactions between people. Advanced algorithms are required for the generation as well as visualization of such graphs. In this work two graph algorithms, one for graph generation, the other for graph visualization, are studied exemplarily. We detail the work of adapting and porting the algorithms to the Intel Xeon Phi coprocessor architecture. Problems in porting real software projects and used libraries are encountered and solved. Memory allocations turned out to be a major problem for the graph generation algorithm. The limited memory of the Xeon Phi forced us to offload chunks of the data from the host system to the Xeon Phi, which impeded performance, eliminating any significant speedup. The data sets consisting of at most 365 000 edges for the graph visualization algorithm fit into the Xeon Phi’s memory easily, which simplified the porting process significantly. We achieve a speedup for sparse graphs over the host system containing two 8-core Intel Xeon (Sandy Bridge) processors. While the hot inner loop by itself can utilize the 512-bit vector instructions of the Xeon Phi, the benefit disappears when embedded in the more complicated full program.

Tags: Algorithms, Computer science, Graph theory, Intel Xeon Phi, OpenMP, Thesis, Visualization

October 6, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Parallel Graph Algorithms on the Xeon Phi Coprocessor

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Parallel Graph Algorithms on the Xeon Phi Coprocessor

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)