7634

Posts

May, 6

Comparison of OpenMP and OpenCL Parallel Processing Technologies

This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing […]
May, 6

Transparent Accelerator Migration in a Virtualized GPU Environment

This paper presents a framework to support transparent, live migration of virtual GPU accelerators in a virtualized execution environment. Migration is a critical capability in such environments because it provides support for fault tolerance, ondemand system maintenance, resource management, and load balancing in the mapping of virtual to physical GPUs. Techniques to increase responsiveness and […]
May, 6

Effects of Compiler Optimizations in OpenMP to CUDA Translation

One thrust of the OpenMP standard development focuses on support for accelerators. An important question is whether or not OpenMP extensions are needed, and how much performance difference they would make. The same question is relevant for related efforts in support of accelerators, such as OpenACC. The present paper pursues this question. We analyze the […]
May, 6

Design of a Hybrid Memory System for General-Purpose Graphics Processing Units

Addressing a limited power budget is a prerequisite for maintaining the growth of computer system performance into and beyond the exascale. Two technologies with the potential to help solve this problem include general-purpose programming on graphics processors and fast non-volatile memories. Combining these technologies could yield devices capable of extreme-scale computation at lower power. The […]
May, 6

One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translation

As an approach to promoting whole-system synergy on a heterogeneous computing system, compilation of fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPU has drawn some recent attentions. This paper concentrates on two important sources of inefficiency that limit existing translators. The first is overly strong synchronizations; the second is thread-level partially redundant computations. […]
May, 4

Efficient Intranode Communication in GPU-Accelerated Systems

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques […]
May, 4

Heterogeneous Task Scheduling for Accelerated OpenMP

Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing focus on only one of […]
May, 4

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Graphics processing units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging […]
May, 4

Simulating the Spread of Epidemics in Real-world Trading Networks using OpenCL

In this paper we investigate a solution to the problem of simulating the spread of epidemics in real-world trading networks. We developed an application that uses parallel computing devices (e.g. GPUs – Graphical Processing Units) with OpenCL (Open Computing Language). Furthermore, we use the epidemiological SIRmodel to represent the nodes of the trading network. Initially, […]
May, 4

Examining the Analytic Structure of Green’s Functions: Massive Parallel Complex Integration using GPUs

Graphics Processing Units (GPUs) are employed for a numerical determination of the analytic structure of two-point correlation functions of Quantum Field Theories. These functions are represented through integrals in d-dimensional Euclidean momentum space. Such integrals can in general not be solved analytically, and therefore one has to rely on numerical procedures to extract their analytic […]
May, 3

High Performance Error Correction for Quantum Key Distribution using Polar Codes

We study the use of polar codes for both discrete and continuous variables Quantum Key Distribution (QKD). Although very large blocks must be used to obtain the efficiency required by quantum key distribution, and especially continuous variables quantum key distribution, their implementation on generic x86 CPUs is practical. Thanks to recursive decoding, they exhibit excellent […]
May, 3

CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction

Computed tomography (CT) to cone-beam computed tomography (CBCT) deformable image registration (DIR) is a crucial step in adaptive radiation therapy. Current intensity-based registration algorithms, such as demons, may fail in the context of CT-CBCT DIR because of inconsistent intensities between the two modalities. In this paper, we propose a variant of demons, called Deformation with […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: