high performance computing on graphics processing units: hgpu.org

Posts

May, 9

Multidimensional upwind hydrodynamics on unstructured meshes using Graphics Processing Units I. Two-dimensional uniform meshes

We present a new method for numerical hydrodynamics which uses a multidimensional generalisation of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a […]

CUDA

May, 6

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite […]

CUDA

May, 6

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

We introduce the CUDA Tensor Transpose (cuTT) library that implements high-performance tensor transposes for NVIDIA GPUs with Kepler and above architectures. cuTT achieves high performance by (a) utilizing two GPU-optimized transpose algorithms that both use a shared memory buffer in order to reduce global memory access scatter, and by (b) computing memory positions of tensor […]

CUDA

May, 6

Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

In the paper we discuss the main features of the software package for numerical simulations of the surface water dynamics. We consider an approximation of the shallow water equations together with the parallel technologies for NVIDIA CUDA graphics processors. The numerical hydrodynamic code is based on the combined Lagrangian-Euler method~(CSPH-TVD). We focused on the features […]

CUDA

May, 6

GPUQT: An efficient linear-scaling quantum transport code fully implemented on graphics processing units

We present GPUQT, a quantum transport code fully implemented on graphics processing units. Using this code, one can obtain intrinsic electronic transport properties of large systems described by a real-space tight-binding Hamiltonian together with one or more types of disorder. The DC Kubo conductivity is represented as a time integral of the velocity auto-correlation or […]

CUDA

May, 6

AFiD-GPU: a versatile Navier-Stokes Solver for Wall-Bounded Turbulent Flows on GPU Clusters

The AFiD code, an open source solver for the incompressible Navier-Stokes equations ({color{blue}burl{this http URL}}), has been ported to GPU clusters to tackle large-scale wall-bounded turbulent flow simulations. The GPU porting has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code […]

CUDA

May, 2

TinyDL: Just-In-Time Deep Learning Solution For Constrained Embedded Systems

This work proposes TinyDL, an automated end-to-end framework that aims to integrate the state-of-the-art Deep Learning (DL) models into embedded systems. TinyDL enables efficient training and execution of DL models as data is collected over time while adhering to the underlying physical resources and constraints. The constraints can be characterized in terms of memory bandwidth, […]

CUDA

May, 2

GPU accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model

This paper presents an application of GPU accelerators in Earth system modelling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate-chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC), used […]

CUDA

May, 2

Speeding up a few orders of magnitude the Jacobi method: high order Chebyshev-Jacobi over GPUs

In this technical note we show how to reach a remarkable speed up when solving elliptic partial differential equations with finite differences thanks to the joint use of the Chebyshev-Jacobi method with high order discretizations and its parallel implementation over GPUs.

CUDA

May, 2

Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor

Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a microlensing code with that of NVIDIA’s GPUs. For the selected set of parameters evaluated in our experiment, we find that […]

CUDA

May, 2

Deep Learning in the Automotive Industry: Applications and Tools

Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive […]

CUDA

Apr, 30

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Multidimensional upwind hydrodynamics on unstructured meshes using Graphics Processing Units I. Two-dimensional uniform meshes

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

GPUQT: An efficient linear-scaling quantum transport code fully implemented on graphics processing units

AFiD-GPU: a versatile Navier-Stokes Solver for Wall-Bounded Turbulent Flows on GPU Clusters

TinyDL: Just-In-Time Deep Learning Solution For Constrained Embedded Systems

GPU accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model

Speeding up a few orders of magnitude the Jacobi method: high order Chebyshev-Jacobi over GPUs

Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor

Deep Learning in the Automotive Industry: Applications and Tools

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)