high performance computing on graphics processing units: hgpu.org

Posts

Sep, 23

Multi-GPU Acceleration of Black-Scholes Equation based Option Pricing

In high-frequency trading of option, "milliseconds earn or lose millions", the computational speed of predicting option price is the crucial factor for option traders to efficiently decide the price and evaluate the corresponding risk.Black-Scholes equation is a mathematical equation describing the option pricing over time. Multi-GPU is a recently developed platform for high-performance computing, which […]

CUDA

Sep, 23

Improving Resource Utilization in Heterogeneous CPU-GPU Systems

Graphics processing units (GPUs) have attracted enormous interest over the past decade due to substantial increases in both performance and programmability. Programmers can potentially leverage GPUs for substantial performance gains, but at the cost of significant software engineering effort. In practice, most GPU applications do not effectively utilize all of the available resources in a […]

CUDA

•

OpenCL

Sep, 23

BenchFriend: Correlating the Performance of GPU Benchmarks

Graphics processing units (GPUs) have become an important platform for general-purpose computing, thanks to their high parallel throughput and high memory bandwidth. GPUs present significantly different architectures from CPUs and require specific mappings and optimizations to achieve high performance. This makes GPU workloads demonstrate application characteristics different from those of CPU workloads. It is critical […]

CUDA

Sep, 23

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

Driven by the goals of efficient and generic communication of noncontiguous data layouts in GPU memory, for which solutions do not currently exist, we present a parallel, noncontiguous data-processing methodology through the MPI datatypes specification. Our processing algorithm utilizes a kernel on the GPU to pack arbitrary noncontiguous GPU data by enriching the datatypes encoding […]

CUDA

Sep, 22

Accelerating Habanero-Java Programs with OpenCL Generation

The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs. Looking to the future, […]

OpenCL

Sep, 22

Investigating the Performance of Motion Estimation Block-Matching Algorithms on GPU Cards

In the field of video compression, motion estimation (ME) is a process that leads to high computational complexity. Implementation of ME block-matching (BM) algorithms on general purpose Central Processing Unit (CPU), has resulted in poor performance. In this paper we investigate the performance of two BM ME algorithms: Three Step Search (TSS) and Four Step […]

CUDA

Sep, 22

Fast Endmember Extraction for Massive Hyperspectral Sensor Data on GPUs

Hyperspectral imaging sensor becomes increasingly important in multi-sensor collaborative observation. The spectral mixture problem seriously influences the efficiency of hyperspectral data exploitation, and endmember extraction is one of the key issues. Due to the high computational cost of algorithm and massive quantity of the hyperspectral sensor data, high-performance computing is extremely demanded for those scenarios […]

CUDA

Sep, 22

Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform

Motion estimation with variable block sizes (VBSME) is one of the most complex models in the HEVC encoder. The HEVC standard supports up to 12 variable block sizes ranging from 4×8/8×4 to 64×64 for motion estimation (ME) and motion compensation (MC). This feature contributes substantial coding gain compared with 7 variable block sizes in H.264/AVC […]

CUDA

Sep, 22

Geo-Correction of High-Resolution Imagery Using Fast Template Matching on a GPU in Emergency Mapping Contexts

The increasing availability of satellite imagery acquired by existing and new sensors allows a wide variety of new applications that depend on the use of diverse spectral and spatial resolution data sets. One of the pre-conditions for the use of hybrid image data sets is a consistent geo-correction capacity. We demonstrate how a novel fast […]

Sep, 21

Optimization solutions for the segmented sum algorithmic function

In this paper, there are depicted optimization solutions for the segmented sum algorithmic function, developed using the Compute Unified Device Architecture (CUDA), a powerful and efficient solution for optimizing a wide range of applications. The parallel-segmented sum is often used in building many data processing algorithms and through its optimization, one can improve the overall […]

CUDA

Sep, 21

A streaming model for nested data parallelism

Efficient parallel algorithms are often written with embedded knowledge of the back-end that they are meant to be executed on, and if they are not, the translation to target language often produces inefficient code. A concrete problem is space complexity in nested data parallel (NDP) languages such as NESL and Data Parallel Haskell, where large […]

CUDA

Sep, 21

Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology

In this paper, we have proposed sequential and parallel Discrete Cosine Transform (DCT) in compute unified device architecture (CUDA) libraries. The introduction of programmable pipeline in the graphics processing units (GPU) has enabled configurability. GPU which is available in every computer has a tremendous feat of highly parallel SIMD processing, but its capability is often […]

CUDA