high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

Wang Lei, Zhang Yunquan, Zhang Xianyi, Liu Fangfang

Lab. of Parallel Comput., Chinese Acad. of Sci., Beijing, China

IEEE 10th International Conference on Computer and Information Technology (CIT), 2010

DOI:10.1109/CIT.2010.212

BibTeX

Source

2016

views

In this paper, the mixed precision algorithm to solve the linear system of equations and the implementation of HPL package are introduced. We use this mixed precision algorithm to improve HPL package on CPU + GPGPU heterogeneous clusters, which is named for GHPL, and give the implementation mechanisms in detail. The experimental results are measured on the platforms of multi-core CPUs and CPU + GPGPU heterogeneous clusters. From the experimental results, we can find out that our GHPL program has good scalability on all the experimental environments and can sustain more than 1.7Teraflops both on the cluster with 16 nodes containing 32 NVIDIA Tesla C1060 GPUs and on the cluster with 8 nodes containing 32 ATI GeForce GTX 295 GPUs, while the average speedup of it with respect to HPL is 3.06 and 2.40 respectively.

Tags: Computer science, CUDA, GPU cluster, Heterogeneous systems, Linear Algebra, Mixed precision, nVidia, nVidia GeForce GTX 295, Tesla C1060

April 20, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster

Share this:

Recent source codes

Most viewed papers (last 30 days)