high performance computing on graphics processing units: hgpu.org

Posts

Nov, 12

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

The fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the […]

CUDA

Nov, 12

Creating HW/SW co-designed MPSoPC’s from high level programming models

FPGA densities have continued to follow Moore’s law and can now support a complete multiprocessor system on programmable chip. The benefits of the FPGA include the ability to build a customized MPSoC system consisting of heterogeneous processing resources, interconnects and memory hierarchies that best match the requirements of each application. In this paper we outline […]

OpenCL

Nov, 12

A translator framework for Dynamic Programming problems

The advent of multicore systems, joined to the potential acceleration of the graphics processing units, has given us a low cost computation capability unprecedented. The new systems alleviate some well known important architectural problems at the expense of a considerable increment of the programmability wall. The heterogeneity, both at architectural and programming level at the […]

Nov, 12

Compiling for a heterogeneous vector image processor

We present a new compilation strategy, implemented at a small cost, to optimize image applications developed on top of a high level image processing library for an heterogeneous processor with a vector image processing accelerator. The library provides the semantics of the image computations. The pipelined structure of the accelerator allows to compute whole expressions […]

Nov, 12

Safe Asynchronous Multicore Memory Operations

Asynchronous memory operations provide a means for coping with the memory wall problem in multicore processors, and are available in many platforms and languages, e.g., the Cell Broadband Engine, CUDA and OpenCL. Reasoning about the correct usage of such operations involves complex analysis of memory accesses to check for races. We present a method and […]

Nov, 11

Performance Analysis and Benchmarking of the Intel SCC

There has been a continuous change over the past years in CPU design and development towards both power-aware hardware architectures as well as many-core processors. The Intel Single-chip Cloud Computer (SCC) combines those two trends. It is an experimental prototype created by Intel Labs consisting of 48 Pentium cores. The SCC is a highly configurable […]

Nov, 11

Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers

Architectures in which multicore chips are augmented with graphics processing units (GPUs) have great potential in many domains in which computationally intensive real-time workloads must be supported. However, unlike standard CPUs, GPUs are treated as I/O devices and require the use of interrupts to facilitate communication with CPUs. Given their disruptive nature, interrupts must be […]

CUDA

Nov, 11

Many-body quantum chemistry on graphics processing units

Heterogeneous nodes composed of a multicore CPU and at least one graphics processing unit (GPU) are increasingly common in high-performance scientific computing, and significant programming effort is currently being undertaken to port existing scientific algorithms to these unique architectures. We present implementations for two many-body quantum chemistry methods on heterogeneous nodes: the coupled-cluster with single […]

CUDA

Nov, 11

Accelerating the Smoldyn Spatial Stochastic Biochemical Reaction Network Simulator Using GPUs

Smoldyn is a spatio-temporal biochemical reaction network simulator. It belongs to a class of methods called particle-based methods and is capable of handling effects such as molecular crowding. Individual molecules are modelled as point objects that can diffuse and react in a control volume. Since each molecule has to be simulated individually, the computational complexity […]

CUDA

Nov, 11

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Due to their massive computational power, graphics processing units (GPUs) have become a popular platform for executing general purpose parallel applications. GPU programming models allow the programmer to create thousands of threads, each executing the same computing kernel. GPUs exploit this parallelism in two ways. First, threads are grouped into fixed-size SIMD batches known as […]

CUDA

Nov, 11

An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds

With recent and continued increases in computing power, and advances in the field of computer graphics, realistic augmented reality environments can now offer inexpensive and powerful solutions in a whole range of training, simulation and leisure applications. One key challenge to maintaining convincing augmentation, and therefore user immersion, is ensuring consistent illumination conditions between virtual […]

Nov, 11

Synthetic Aperture Beamformation using the GPU

A synthetic aperture ultrasound beamformer is implemented for a GPU using the OpenCL framework. The implementation supports beamformation of either RF signals or complex baseband signals. Transmit and receive apodization can be either parametric or dynamic using a fixed F-number, a reference, and a direction. Images can be formed using an arbitrary number of emissions […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

Creating HW/SW co-designed MPSoPC’s from high level programming models

A translator framework for Dynamic Programming problems

Compiling for a heterogeneous vector image processor

Safe Asynchronous Multicore Memory Operations

Performance Analysis and Benchmarking of the Intel SCC

Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers

Many-body quantum chemistry on graphics processing units

Accelerating the Smoldyn Spatial Stochastic Biochemical Reaction Network Simulator Using GPUs

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds

Synthetic Aperture Beamformation using the GPU

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)