high performance computing on graphics processing units: hgpu.org

Posts

Nov, 12

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration

We have developed GSNP, a software package with GPU acceleration, for single-nucleotide polymorphism detection on DNA sequences generated from second-generation sequencing equipment. Compared with SOAPsnp, a popular, high-performance CPU-based SNP detection tool, GSNP has several distinguishing features: First, we design a sparse data representation format to reduce memory access as well as branch divergence. Second, […]

Nov, 12

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

The fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the […]

CUDA

Nov, 12

Creating HW/SW co-designed MPSoPC’s from high level programming models

FPGA densities have continued to follow Moore’s law and can now support a complete multiprocessor system on programmable chip. The benefits of the FPGA include the ability to build a customized MPSoC system consisting of heterogeneous processing resources, interconnects and memory hierarchies that best match the requirements of each application. In this paper we outline […]

OpenCL

Nov, 12

A translator framework for Dynamic Programming problems

The advent of multicore systems, joined to the potential acceleration of the graphics processing units, has given us a low cost computation capability unprecedented. The new systems alleviate some well known important architectural problems at the expense of a considerable increment of the programmability wall. The heterogeneity, both at architectural and programming level at the […]

Nov, 12

Compiling for a heterogeneous vector image processor

We present a new compilation strategy, implemented at a small cost, to optimize image applications developed on top of a high level image processing library for an heterogeneous processor with a vector image processing accelerator. The library provides the semantics of the image computations. The pipelined structure of the accelerator allows to compute whole expressions […]

Nov, 12

Safe Asynchronous Multicore Memory Operations

Asynchronous memory operations provide a means for coping with the memory wall problem in multicore processors, and are available in many platforms and languages, e.g., the Cell Broadband Engine, CUDA and OpenCL. Reasoning about the correct usage of such operations involves complex analysis of memory accesses to check for races. We present a method and […]

Nov, 11

Performance Analysis and Benchmarking of the Intel SCC

There has been a continuous change over the past years in CPU design and development towards both power-aware hardware architectures as well as many-core processors. The Intel Single-chip Cloud Computer (SCC) combines those two trends. It is an experimental prototype created by Intel Labs consisting of 48 Pentium cores. The SCC is a highly configurable […]

Nov, 11

Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers

Architectures in which multicore chips are augmented with graphics processing units (GPUs) have great potential in many domains in which computationally intensive real-time workloads must be supported. However, unlike standard CPUs, GPUs are treated as I/O devices and require the use of interrupts to facilitate communication with CPUs. Given their disruptive nature, interrupts must be […]

CUDA

Nov, 11

Many-body quantum chemistry on graphics processing units

Heterogeneous nodes composed of a multicore CPU and at least one graphics processing unit (GPU) are increasingly common in high-performance scientific computing, and significant programming effort is currently being undertaken to port existing scientific algorithms to these unique architectures. We present implementations for two many-body quantum chemistry methods on heterogeneous nodes: the coupled-cluster with single […]

CUDA

Nov, 11

Accelerating the Smoldyn Spatial Stochastic Biochemical Reaction Network Simulator Using GPUs

Smoldyn is a spatio-temporal biochemical reaction network simulator. It belongs to a class of methods called particle-based methods and is capable of handling effects such as molecular crowding. Individual molecules are modelled as point objects that can diffuse and react in a control volume. Since each molecule has to be simulated individually, the computational complexity […]

CUDA

Nov, 11

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Due to their massive computational power, graphics processing units (GPUs) have become a popular platform for executing general purpose parallel applications. GPU programming models allow the programmer to create thousands of threads, each executing the same computing kernel. GPUs exploit this parallelism in two ways. First, threads are grouped into fixed-size SIMD batches known as […]

CUDA

Nov, 11

An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds

With recent and continued increases in computing power, and advances in the field of computer graphics, realistic augmented reality environments can now offer inexpensive and powerful solutions in a whole range of training, simulation and leisure applications. One key challenge to maintaining convincing augmentation, and therefore user immersion, is ensuring consistent illumination conditions between virtual […]

high performance computing on graphics processing units: hgpu.org

Posts

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

Creating HW/SW co-designed MPSoPC’s from high level programming models

A translator framework for Dynamic Programming problems

Compiling for a heterogeneous vector image processor

Safe Asynchronous Multicore Memory Operations

Performance Analysis and Benchmarking of the Intel SCC

Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers

Many-body quantum chemistry on graphics processing units

Accelerating the Smoldyn Spatial Stochastic Biochemical Reaction Network Simulator Using GPUs

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)