high performance computing on graphics processing units: hgpu.org

Posts

Sep, 11

Coherent transport by adiabatic passage on atom chips

Adiabatic techniques offer some of the most promising tools to achieve high-fidelity control of the centre-of-mass degree of freedom of single atoms. As their main requirement is to follow an eigenstate of the system, constraints on timing and field strength stability are usually low, especially for trapped systems. In this paper we present a detailed […]

CUDA

Sep, 11

D5.5.4 – Characterization of Redundancy and Definition of Work Reuse

This task involves the following work: – Establishing the relation of Quality of Service (QoS) and energy to accuracy. – Design and development of techniques to dynamically decrease accuracy (e.g., ignore low order bits in computations). Deliberately ignoring a few low order bits in calculations where the application allows it (in terms of impact to […]

CUDA

Sep, 11

Hardware-Oblivious Parallelism for In-Memory Column-Stores

The multi-core architectures of today’s computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We […]

OpenCL

Sep, 11

Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures

Collision detection is a complex task that can be described simply: given a set of objects, we want to know which ones collide. In the literature, we can found numerous algorithms that depend on objects property, but we can’t find an overall solution that works on every objects. The internship focuses on a recent algorithm […]

OpenCL

Sep, 11

GPU Implementations of Object Detection using HOG Features and Deformable Models

Vision-based object detection using camera sensors is an essential piece of perception for autonomous vehicles. Various combinations of features and models can be applied to increase the quality and the speed of object detection. A well-known approach uses histograms of oriented gradients (HOG) with deformable models to detect a car in an image [15]. A […]

CUDA

Sep, 11

OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform

GPGPU or general purpose GPU programming was dramatically changed and demystified when NVIDIA introduced the CUDA architecture for their GPUs in 2006. Since then, problems in Engineering, Economics or any fields do not need to be translated to graphics problems to be processed by the GPU. The GPU programming paradigm was dramatically shifted when CUDA […]

CUDA

Sep, 11

A Mixed Hierarchical Algorithm for Nearest Neighbor Search

The k nearest neighbor (kNN) search is a computationally intensive application critical to fields such as image processing, statistics, and biology. Recent works have demonstrated the efficacy of k-d tree based implementations on multi-core CPUs. It is unclear, however, whether such tree based implementations are amenable for execution in high-density processors typified today by the […]

CUDA

Sep, 11

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as `Kepler’. […]

CUDA

Sep, 9

Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization

We characterize the phase diagram of anisotropic Heisenberg spin glasses, finding both the spin and the chiral glass transition. We remark the presence of strong finite-size effects on the chiral sector. We find a unique phase transition for the chiral and spin glass sector, in the Universality class of Ising spin glasses. We focus on […]

CUDA

Sep, 9

Implementation of PDE models of cardiac dynamics on GPUs using OpenCL

Graphical processing units (GPUs) promise to revolutionize scientific computing in the near future. Already, they allow almost real-time integration of simplified numerical models of cardiac tissue dynamics. However, the integration methods that have been developed so far are typically of low order and use single precision arithmetics. In this work, we describe numerical implementation of […]

OpenCL

Sep, 9

A GPU Implementation for Two-Dimensional Shallow Water Modeling

In this paper, we present a GPU implementation of a two-dimensional shallow water model. Water simulations are useful for modeling floods, river/reservoir behavior, and dam break scenarios. Our GPU implementation shows vast performance improvements over the original Fortran implementation. By taking advantage of the GPU, researchers and engineers will be able to study water systems […]

CUDA

Sep, 9

GPU Accelerated Particle Visualization with Splotch

Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Coherent transport by adiabatic passage on atom chips

D5.5.4 – Characterization of Redundancy and Definition of Work Reuse

Hardware-Oblivious Parallelism for In-Memory Column-Stores

Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures

GPU Implementations of Object Detection using HOG Features and Deformable Models

OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform

A Mixed Hierarchical Algorithm for Nearest Neighbor Search

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization

Implementation of PDE models of cardiac dynamics on GPUs using OpenCL

A GPU Implementation for Two-Dimensional Shallow Water Modeling

GPU Accelerated Particle Visualization with Splotch

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)