3709

Posts

Apr, 15

Parallel implementation of a Quantization algorithm for pricing American style options on GPGPU

The Quantization Tree algorithm has proven to be quite an efficient tool for the evaluation of financial derivatives with non-vanilla exercise rights as American-, Bermudan-or Swing options. Nevertheless, it relies heavily on a fast computation of the transition probabilities in the underlying Quantization Tree. Since this estimation is typically done by Monte-Carlo simulations, it is […]
Apr, 15

Emerging technology about GPGPU

By a rapid development of graphics processing unit (GPU), the programmability and highly parallel processing feature of GPU create a chance to allow the general purpose computation to be conducted on GPU, conventionally called GPGPU (general purpose computation on GPU). A brief survey, in particular on the rationale of how the GPU architecture leads to […]
Apr, 15

GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data

This paper describes the development of fast Bayesian reconstruction methods for Compton cameras using commodity graphics hardware. For fast iterative reconstruction, not only is it important to increase the convergence rate, but also it is equally important to accelerate the computation of time-consuming and repeated operations, such as projection and backprojection. Since the size of […]
Apr, 15

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in […]
Apr, 15

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction

Full resolution electron microscopic tomographic (EMT) reconstruction of large-scale tilt series requires significant computing power. The desire to perform multiple cycles of iterative reconstruction and realignment dramatically increases the pressing need to improve reconstruction performance. This has motivated us to develop a distributed multi-GPU (graphics processing unit) system to provide the required computing power for […]
Apr, 15

GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems

The details of the graphical processing unit (GPU) implementation of the most computationally intensive (T)-part of the recently introduced regularized CCSD(T) (Reg-CCSD(T)) method [ Kowalski, K. ; Valiev, M. J. Chem. Phys. 2009, 131 , 234107 ] for calculating electronic energies of strongly correlated systems are discussed. Parallel tests performed for several molecular systems show […]
Apr, 15

Implementation of Jacobi iterative method on graphics processor unit

CUDA is a new computing architecture introduced by NVIDIA Corporation, aiming at general purpose computation on GPU. The architecture has strong compute power in the compute-intensive applications and data-intensive applications, so in recent years, how the framework is applied to the scientific computing has become a hot research. The iterative method for solving systems of […]
Apr, 15

Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms

The challenging task of analyzing on-chip power (ground) distribution networks with multimillion node complexity and beyond is key to today’s large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT)-based graphics processing unit (GPU) platforms to tackle large-scale power grid analysis with promising performance. Several key enablers […]
Apr, 15

The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications

The usage and adoption of General Purpose GPUs (GPGPU) in HPC systems is increasing due to the unparalleled performance advantage of the GPUs and the ability to fulfill the ever-increasing demands for floating points operations. While the GPU can offload many of the application parallel computations, the system architecture of a GPU-CPU-InfiniBand server does require […]
Apr, 14

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

Simulation of low-density parity-check (LDPC) codes frequently takes several days, thus the use of general purpose graphics processing units (GPGPUs) is very promising. However, GPGPUs are designed for compute-intensive applications, and they are not optimized for data caching or control management. In LDPC decoding, the parity check matrix H needs to be accessed at every […]
Apr, 14

Count Sort for GPU Computing

Counting sort is a simple, stable and efficient sort algorithm with linear running time, which is a fundamental building block for many applications. This paper depicts the design issues of a data parallel implementation of the count sort algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA […]
Apr, 14

GPU-based high-speed and high-precision visual tracking

This paper presents a method for implementing the ESM visual tracker proposed by Malis et al. on a GPU to realize fast and accurate visual tracking. The ESM tracker is effective especially for the images in which feature points are difficult to obtain, since it uses entire image pixels of the target image region. Although […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: