high performance computing on graphics processing units: hgpu.org

Posts

May, 4

Efficient Intranode Communication in GPU-Accelerated Systems

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques […]

CUDA

May, 4

Heterogeneous Task Scheduling for Accelerated OpenMP

Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing focus on only one of […]

CUDA

May, 4

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Graphics processing units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging […]

OpenCL

May, 4

Simulating the Spread of Epidemics in Real-world Trading Networks using OpenCL

In this paper we investigate a solution to the problem of simulating the spread of epidemics in real-world trading networks. We developed an application that uses parallel computing devices (e.g. GPUs – Graphical Processing Units) with OpenCL (Open Computing Language). Furthermore, we use the epidemiological SIRmodel to represent the nodes of the trading network. Initially, […]

OpenCL

May, 4

Examining the Analytic Structure of Green’s Functions: Massive Parallel Complex Integration using GPUs

Graphics Processing Units (GPUs) are employed for a numerical determination of the analytic structure of two-point correlation functions of Quantum Field Theories. These functions are represented through integrals in d-dimensional Euclidean momentum space. Such integrals can in general not be solved analytically, and therefore one has to rely on numerical procedures to extract their analytic […]

CUDA

May, 3

High Performance Error Correction for Quantum Key Distribution using Polar Codes

We study the use of polar codes for both discrete and continuous variables Quantum Key Distribution (QKD). Although very large blocks must be used to obtain the efficiency required by quantum key distribution, and especially continuous variables quantum key distribution, their implementation on generic x86 CPUs is practical. Thanks to recursive decoding, they exhibit excellent […]

May, 3

CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction

Computed tomography (CT) to cone-beam computed tomography (CBCT) deformable image registration (DIR) is a crucial step in adaptive radiation therapy. Current intensity-based registration algorithms, such as demons, may fail in the context of CT-CBCT DIR because of inconsistent intensities between the two modalities. In this paper, we propose a variant of demons, called Deformation with […]

CUDA

May, 3

A GPU Tool for Efficient, Accurate, and Realistic Simulation of Cone Beam CT Projections

Simulation of x-ray projection images plays an important role in cone beam CT (CBCT) related research projects. A projection image contains primary signal, scatter signal, and noise. It is computationally demanding to perform accurate and realistic computations for all of these components. In this work, we develop a package on GPU, called gDRR, for the […]

May, 3

A Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes

We present a framework to interactively volume-render three-dimensional data cubes using distributed ray-casting and volume bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core CPU. The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of […]

CUDA

May, 3

Using high performance computing and Monte Carlo simulation for pricing american options

High performance computing (HPC) is a very attractive and relatively new area of research, which gives promising results in many applications. In this paper HPC is used for pricing of American options. Although the American options are very significant in computational finance; their valuation is very challenging, especially when the Monte Carlo simulation techniques are […]

CUDA

May, 2

A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark

The paper introduces an optimized multicore CPU implementation of the genetic algorithm and compares its performance with a fine-tuned GPU version. The main goal is to show the true performance relation between modern CPUs and GPUs and eradicate some of myths surrounding GPU performance. It is essential for the evolutionary community to provide the same […]

CUDA

May, 2

Dynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems

With their high computation throughput and outstanding performance-per-watt figures, the graphics processing units (GPU) are becoming increasingly important for high-performance computing (HPC) systems. Existing GPU execution environment restricts the GPU usage to local host node. This is suitable for standalone computer nodes, but becomes inefficient for HPC systems that consist of a large number of […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient Intranode Communication in GPU-Accelerated Systems

Heterogeneous Task Scheduling for Accelerated OpenMP

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Simulating the Spread of Epidemics in Real-world Trading Networks using OpenCL

Examining the Analytic Structure of Green’s Functions: Massive Parallel Complex Integration using GPUs

High Performance Error Correction for Quantum Key Distribution using Polar Codes

CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction

A GPU Tool for Efficient, Accurate, and Realistic Simulation of Cone Beam CT Projections

A Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes

Using high performance computing and Monte Carlo simulation for pricing american options

A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark

Dynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)