high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Automatic code generation methods applied to numerical linear algebra in high performance computing

Automatic code generation methods applied to numerical linear algebra in high performance computing

M. Ian Masliah

LRI – Laboratoire de Recherche en Informatique

tel-01395496, (15 November 2016)

@phdthesis{masliah:tel-01395496,

Title={Automatic code generation methods applied to numerical linear algebra in high performance computing},

Author={Masliah, Ian},

URL={https://tel.archives-ouvertes.fr/tel-01395496},

Number={2016SACLS285},

School={Universit{‘e} Paris-Saclay},

Year={2016},

Month={Sep},

Keywords={C++; Generic programming; CUDA; Meta-Programming; GPU; Domain specific languages; Generative programming; Linear algebra; Programmation g{‘e}n{‘e}rique; Meta programmation; Languages d{‘e}di{‘e}s; Programmation g{‘e}n{‘e}rative ; Alg{‘e}bre lin{‘e}aire},

Type={Theses},

PDF={https://tel.archives-ouvertes.fr/tel-01395496/file/76325_MASLIAH_2016_diffusion.pdf},

hal_id={tel-01395496},

hal_version={v1}

}

Download (PDF)

View

Source

1602

views

Parallelism in today’s computer architectures is ubiquitous whether it be in supercomputers, workstations or on portable devices such as smartphones. Exploiting efficiently these systems for a specific application requires a multidisciplinary effort that concerns Domain Specific Languages (DSL), code generation and optimization techniques and application-specific numerical algorithms. In this PhD thesis, we present a method of high level programming that takes into account the features of heterogeneous architectures and the properties of matrices to build a generic dense linear algebra solver. Our programming model supports both implicit or explicit data transfers to and from General-Purpose Graphics Processing Units (GPGPU) and Integrated Graphic Processors (IGPs). As GPUs have become an asset in high performance computing, incorporating their use in general solvers is an important issue. Recent architectures such as IGPs also require further knowledge to program them efficiently. Our method aims at simplifying the development on parallel architectures through the use of high level programming techniques. As an example, we developed a least-squares solver based on semi-normal equations in mixed precision that cannot be found in current libraries. This solver achieves similar performance as other mixed-precision algorithms. We extend our approach to a new multistage programming model that alleviates the interoperability problems between the CPU and GPU programming models. Our multistage approach is used to automatically generate GPU code for CPU-based element-wise expressions and parallel skeletons while allowing for type-safe program generation. We illustrate that this work can be applied to recent architectures and algorithms. The resulting code has been incorporated into a C++ library called NT2. Finally, we investigate how to apply high level programming techniques to batched computations and tensor contractions. We start by explaining how to design a simple data container using modern C++-14 programming techniques. Then, we study the issues around batched computations, memory locality and code vectorization to implement a highly optimized matrix-matrix product for small sizes using SIMD instructions. By combining a high level programming approach and advanced parallel programming techniques, we show that we can outperform state of the art numerical libraries.

Tags: Algorithms, Code generation, Computer science, CUDA, Heterogeneous systems, Linear Algebra, Mixed precision, nVidia, nVidia Tegra TX1, Programming techniques, Tesla C2075, Tesla K40, Thesis

November 16, 2016 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Automatic code generation methods applied to numerical linear algebra in high performance computing

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Automatic code generation methods applied to numerical linear algebra in high performance computing

Share this:

Recent source codes

Most viewed papers (last 30 days)