5853

Posts

Oct, 1

Parallel Computing based on GPGPU using Compute Unified Device Architecture

The demand of processing a huge amount of data within a limited time and the developing of computing capability of Graphic Process Unit (GPU) lead us to the world of parallel computing on General Purpose GPU (GPGPU). Because of the exposed parallelism, GPGPU could assign processing tasks to multiple threads and execute these threads simultaneously. […]
Oct, 1

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

Automatic compilation for multiple types of devices is important, especially given the current trends towards heterogeneous computing. This paper concentrates on some issues in compiling fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPUs. It points out some correctness pitfalls in existing techniques, particularly in their treatment to implicit synchronizations. It then describes a […]
Oct, 1

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

Emerging semantic search techniques require fast comparison of large "concept trees". This paper addresses the challenges involved in fast computation of similarity between two large concept trees using a CUDA-enabled GPGPU co-processor. We propose efficient techniques for the same using fast hash computations, membership tests using Bloom Filters and parallel reduction. We show how a […]
Oct, 1

Customizable Domain-Specific Computing

To meet computing needs and overcome power density limitations, the computing industry has entered the era of parallelization. However, highly parallel, general-purpose computing systems face serious challenges in terms of performance, energy, heat dissipation, space, and cost. We believe that there is significant opportunity to look beyond parallelization and focus on domain-specific customization to bring […]
Oct, 1

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is […]
Oct, 1

A Comprehensive Performance Comparison of CUDA and OpenCL

This paper presents a comprehensive performance comparison between CUDA and OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world ones. We make an extensive analysis of the performance gaps taking into account programming models, optimization strategies, architectural details, and underlying compilers. Our results show that, for most applications, CUDA performs at most […]
Oct, 1

Accelerating Vector Calculations on GPU

Multicore computational accelerators such as Graphics Processor Units (GPUs) became common for gaining high-performance computing on a larger scale. Programming GPUs requires detailed knowledge of the underlying architecture in order to get maximum performance. In this paper we present solution of vector distance calculation on NVIDIA’s parallel computing architecture CUDA (Common Unified Device Architecture), where […]
Oct, 1

Large Scale DNA Sequence Alignment and Kernel Method Implemented with GPUs

Large Scale DNA sequence alignment and Kernel method in molecular biology play critical roles in bioinformatics. Both of which are successfully implemented on the brook+ platform with AMD’s GPUs. Aiming at the characters of graphical stream processors, we propose internal and external approach cooperatively to promote the performance of the two algorithms. The experiments show […]
Oct, 1

Interactive Soft Tissue for Surgical Simulation

Medical simulation has the potential to revolutionise the training of medical practitioners. Advantages include reduced risk to patients, increased access to rare scenarios and virtually unlimited repeatability. However, in order to fulfil its potential, medical simulators require techniques to provide realistic user interaction with the simulated patient. Specifically, compelling real-time simulations that allow the trainee […]
Oct, 1

Image registration on GPU

Image registration is a fundamental step in many applications involving image analysis. It consists of optimizing a similarity metric to find a spatial transformation to match two images (in 3D). It has application in medical images to build atlases (registering a population), or to align a patient to a template to detect pathologies. The main […]
Sep, 30

Exploring The Latency and Bandwidth Tolerance of CUDA Applications

CUDA applications represent a new body of parallel programs. Although several paradigms exist for programming distributed systems and many-core processors, many users struggle to achieve a program that is scalable across systems with different hardware characteristics. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from […]
Sep, 30

Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems

The emergence of scientific applications embedded with multiple modes of parallelism has made heterogeneous computing systems indispensable in high performance computing. The popularity of such systems is evident from the fact that three out of the top five fastest supercomputers in the world employ heterogeneous computing, i.e., they use dissimilar computational units. A closer look […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: