high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Heterogeneous parallel computing for image registration and linear algebra applications

Heterogeneous parallel computing for image registration and linear algebra applications

Orestis Zachariadis

Universidad de Córdoba

Universidad de Córdoba, 2020

@article{zachariadis2020heterogeneous,

title={Heterogeneous parallel computing for image registration and linear algebra applications},

author={Zachariadis, Orestis},

year={2020},

publisher={Universidad de C{‘o}rdoba, UCOPress}

}

Download (PDF)

View

Source

Source codes

Package:

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

2175

views

This doctoral thesis focuses on GPU acceleration of medical image registration and sparse general matrix-matrix multiplication (SpGEMM). The comprehensive work presented here aims to enable new possibilities in Image Guided Surgery (IGS). IGS provides the surgeon with advanced navigation tools during surgery. Image registration, which is a part of IGS, is computationally demanding, therefore GPU acceleration is greatly desirable. spGEMM, which is an essential part in many scientific and data analytics applications, e.g., graph applications, is also a useful tool in biomechanical modeling and sparse vessel network registration. We present this work in two parts. The first part of this thesis describes the optimization of the most demanding part of non-rigid Free Form Deformation registration, i.e., B-spline interpolation. Our novel optimization technique minimizes the data movement between processing cores and memory and maximizes the utilization of the very fast register file. In addition, our approach re-formulates B-spline interpolation to fully utilize Fused Multiply Accumulation instructions for additional benefits in performance and accuracy. Our optimized B-spline interpolation provides significant speedup to image registration. The second part describes the optimization of spGEMM. Hardware manufacturers, with the aim of increasing the performance of deep-learning, created specialized dense matrix multiplication units, called Tensor Core Units (TCUs). However, until now, no work takes advantage of TCUs for sparse matrix multiplication. With this work we provide the first TCU implementation of spGEMM and prove its benefits over conventional GPU spGEMM.

Tags: CUDA, Heterogeneous systems, Image processing, Image registration, Linear Algebra, Matrix multiplication, nVidia, nVidia GeForce GTX 1050, nVidia GeForce RTX 2070, Package, Sparse matrix, Thesis

August 9, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Heterogeneous parallel computing for image registration and linear algebra applications

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Heterogeneous parallel computing for image registration and linear algebra applications

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)