high performance computing on graphics processing units: hgpu.org

Posts

May, 7

Improving the Programmability of GPU Architectures

Throughout the past decades, the tremendous growth of single-core performance has been the key-enabler for digital technology to become ubiquitous in our society. Recently, diminishing returns on Dennard scaling resulted in power dissipation issues, leading to reduced performance growth. Performance growth has since been re-enabled by multi-core processors as well as by exploiting the energy […]

CUDA

•

OpenCL

May, 7

Orchestrating Thread Scheduling and Cache Management to Improve Memory System Throughput in Throughput Processors

Throughput processors such as GPUs continue to provide higher peak arithmetic capability. Designing a high throughput memory system to keep the computational units busy is very challenging. Future throughput processors must continue to exploit data locality and utilize the on-chip and off-chip resources in the memory system more effectively to further improve the memory system […]

May, 7

Bio-Inspired Optimization of Ultra-Wideband Patch Antennas Using Graphics Processing Unit Acceleration

Ultra-wideband (UWB) wireless systems have recently gained considerable attention as effective communications platforms with the properties of low power and high data rates. Applications of UWB such as wireless USB put size constraints on the antenna, however, which can be very difficult to meet using typical narrow band antenna designs. The aim of this thesis […]

OpenCL

May, 7

Parallel Solving Massive Linear Equations with CUDA

By consulting the state-of-the-art methods on massive linear equations solving and parallel computing, the main issue of calculation have been extracted from finite element method. The author test some solving routines on the CPU based as well as design and implement on GPU by using CUDA. The coalesced access result on GPU shows a ten […]

CUDA

May, 7

Simultaneous Use of CPU and GPU to Real Time Inverted Index Updating in Microblogs

Nowadays, with attention to developing the different data networks, the wide masses of data are producing and updating continually. Managing the great data enumerate the fundamental challenges in data mining. One of the considered main subjects in this context is how searching among the wide masses of data. Therefore, require to producing the typical powerful, […]

CUDA

May, 6

Accelerating Cryptosystems on Hardware Platforms

In the past decade, one of the major breakthroughs in computer science theory is the first construction of fully homomorphic encryption (FHE) scheme introduced by Gentry. Using a FHE one may perform an arbitrary numbers of computations directly on the encrypted data without revealing of the secret key. Therefore, a practical FHE provides an invaluable […]

CUDA

May, 6

GPU-Accelerated Joint 1D and 2D Barcode Localization on Smartphones

The built-in cameras and powerful processors have turned smartphones into ubiquitous barcode scanners. In smartphone-based barcode scanning, barcode localization is an important preprocessing step that quickly scans the entire camera image and passes barcode candidates to the actual decoder. This paper presents the implementation steps of a robust joint 1D and 2D barcode localization algorithm […]

OpenGL

May, 6

Implementing an efficient method of check-pointing on CPU-GPU

In this paper, we describe the design, implementation, verification and analysis of providing fine-grained architectural support for efficient check-pointing and restart on a CPU-GPU heterogeneous system. We use Multi2sim, a simulator, capable of emulating a CPU-GPU system. The simulator is capable of emulating a 32 bit x86 CPU that launches OpenCl Kernels on the GPU […]

OpenCL

May, 6

Mimetic Methods for Lagrangian Relaxation of Magnetic Fields

We present a new code that performs a relaxation of a magnetic field towards a force-free state (Beltrami field) using a Lagrangian numerical scheme. Beltrami fields are of interest for the dynamics of many technical and astrophysical plasmas as they are the lowest energy states that the magnetic field can reach. The numerical method strictly […]

CUDA

May, 6

Multireduce and Multiscan on Modern GPUs

With the introduction of platforms like CUDA and OpenCL, the superior computing power of modern GPUs compared to CPUs is used more and more often to accelerate general purpose computations. Data parallel primitives like reduce, scan or sort can be used as simple, deterministic building blocks for parallel algorithms, hiding the complexity of the underlying […]

CUDA

May, 5

Computer vision for continuous plankton monitoring

Plankton microorganisms constitute the base of the marine food web and play a great role in global atmospheric carbon dioxide drawdown. Moreover, being very sensitive to any environmental changes they allow noticing (and potentially counteracting) them faster than with any other means. As such they not only influence the fishery industry but are also frequently […]

CUDA

May, 5

Non-separable 2D, 3D and 4D filtering with CUDA

We have presented solutions for fast non-separable floating point convolution in 2, 3 and 4 dimensions, using the CUDA programming language. We believe that these implementations will serve as a complement to the NPP library, which currently only supports 2D filters and images stored as integers. The shared memory implementation with loop unrolling is approximately […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Improving the Programmability of GPU Architectures

Orchestrating Thread Scheduling and Cache Management to Improve Memory System Throughput in Throughput Processors

Bio-Inspired Optimization of Ultra-Wideband Patch Antennas Using Graphics Processing Unit Acceleration

Parallel Solving Massive Linear Equations with CUDA

Simultaneous Use of CPU and GPU to Real Time Inverted Index Updating in Microblogs

Accelerating Cryptosystems on Hardware Platforms

GPU-Accelerated Joint 1D and 2D Barcode Localization on Smartphones

Implementing an efficient method of check-pointing on CPU-GPU

Mimetic Methods for Lagrangian Relaxation of Magnetic Fields

Multireduce and Multiscan on Modern GPUs

Computer vision for continuous plankton monitoring

Non-separable 2D, 3D and 4D filtering with CUDA

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)