high performance computing on graphics processing units: hgpu.org

Posts

Jan, 26

Computing Best Possible Pseudo-Solutions to Interval Linear Systems of Equations

In the paper, we consider interval linear algebraic systems of equations Ax = b, with an interval matrix A and interval right-hand side vector b, as a model of imprecise systems of linear algebraic equations of the same form. We propose a new regularization procedure that reduces the solution of the imprecise linear system to […]

CUDA

Jan, 26

Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services

In this work, we describe an application of convolutional networks to object classification and detection in images. The task of image based object recognition is surveyed in the first chapter. Its application in internet advertisement is one of the main motivations of this work. The architecture of the convolutional networks is described in details in […]

CUDA

Jan, 26

Optimizing Stencil Computations for NVIDIA Kepler GPUs

We present a series of optimization techniques for stencil computations on NVIDIA Kepler GPUs. Stencil computations with regular grids had been ported to the older generations of NVIDIA GPUs with significant performance improvements thanks to the higher memory bandwidth than conventional CPU-only systems. However, because of the architectural changes introduced with the latest generation of […]

CUDA

Jan, 26

Hybrid strategy for stencil computations on the APU

Stencil computations are very regular and well adapted to GPU execution. However, the PCI-E bus that connects a discrete GPU to the system memory has a relatively low bandwidth when compared to the GPU compute power. The AMD APU architecture contains both CPU and GPU on the same chip and shared memory between them, which […]

OpenCL

Jan, 26

Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

The Active Appearance Model (AAM) is one of the most powerful model-based object detecting and tracking methods that has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern Graphics Processing Units (GPUs) that feature a […]

CUDA

Jan, 26

GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic

In order to compensate for the higher cost of double double and quad double arithmetic when solving large polynomial systems, we investigate the application of NVIDIA Tesla C2050, K20C, and K40 general purpose graphics processing units. As the dimension equals several thousands, the cost to compute one QR decomposition is sufficiently large so that the […]

CUDA

Jan, 26

GPU Monte Carlo scatter calculations for Cone Beam Computed Tomography

A GPU Monte Carlo code for x-ray photon transport has been implemented and extensively tested. The code is intended for scatter compensation of cone beam computed tomography images. The code was tested to agree with other well known codes within 5% for a set of simple scenarios. The scatter compensation was also tested using an […]

CUDA

Jan, 25

A High-productivity Framework for Multi-GPU computation of Mesh-based applications

The paper proposes a high-productivity framework for multi-GPU computation of mesh-based applications. In order to achieve high performance on these applications, we have to introduce complicated optimized techniques for GPU computing, which requires relatively-high cost of implementation. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU code. […]

CUDA

Jan, 25

Accelerating a Bayesian Phylogenetic Inference Application with OpenACC

The need for faster computing has been around ever since the birth of the first computers. Faster hardware will almost always guarantee faster computing but occasionally the rate of hardware development is not enough for some programs to deal with the vast information they need. When these programs need to be accelerated, algorithmic optimizations have […]

CUDA

Jan, 25

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special purpose hardware for a long time. Systems such as custom-built FPGA-systems and GPUs are still widely-used today, in particular in interventional settings, where radiologists require a hard time constraint for reconstruction. However, recently is has been shown that today even commodity […]

CUDA

Jan, 25

Improvement of the fused CUDA kernels performance prediction

In this thesis a tool for improving the performance prediction of a source-to-source compiler of mapped functions developed on the Faculty of Informatics is presented. This tool integrates the modification of the original compiler and static and dynamic data gathering to provide as much data about the fusions as possible in order to analyze them. […]

CUDA

Jan, 25

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Computing Best Possible Pseudo-Solutions to Interval Linear Systems of Equations

Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services

Optimizing Stencil Computations for NVIDIA Kepler GPUs

Hybrid strategy for stencil computations on the APU

Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic

GPU Monte Carlo scatter calculations for Cone Beam Computed Tomography

A High-productivity Framework for Multi-GPU computation of Mesh-based applications

Accelerating a Bayesian Phylogenetic Inference Application with OpenACC

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

Improvement of the fused CUDA kernels performance prediction

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)