Posts
Apr, 16
Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing
Power dissipation is one of the most imminent limitation factors influencing the development of High Performance Computing (HPC). Toward power-efficient HPC on CPU-GPU hybrid platform, we are investigating software methodologies to achieve optimized power utilization by algorithm design and programming technique. In this paper we discuss power measurements of GPU, propose a method of automatic […]
Apr, 16
Accelerating Particle Swarm Algorithm with GPGPU
This paper focuses on solving large size optimization problems using GPGPU. Evolutionary Algorithms for solving these optimization problems suffer from the curse of dimensionality, which implies that their performance deteriorates as quickly as the dimensionality of the search space increases. This difficulty makes very challenging the performance studies for very high dimensional problems. Furthermore, these […]
Apr, 15
N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions
We present a high-performance N-body code for astronomical collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8MB cache and 3.40 GHz) based on Sandy […]
Apr, 15
Parallel implementation of a Quantization algorithm for pricing American style options on GPGPU
The Quantization Tree algorithm has proven to be quite an efficient tool for the evaluation of financial derivatives with non-vanilla exercise rights as American-, Bermudan-or Swing options. Nevertheless, it relies heavily on a fast computation of the transition probabilities in the underlying Quantization Tree. Since this estimation is typically done by Monte-Carlo simulations, it is […]
Apr, 15
Emerging technology about GPGPU
By a rapid development of graphics processing unit (GPU), the programmability and highly parallel processing feature of GPU create a chance to allow the general purpose computation to be conducted on GPU, conventionally called GPGPU (general purpose computation on GPU). A brief survey, in particular on the rationale of how the GPU architecture leads to […]
Apr, 15
GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data
This paper describes the development of fast Bayesian reconstruction methods for Compton cameras using commodity graphics hardware. For fast iterative reconstruction, not only is it important to increase the convergence rate, but also it is equally important to accelerate the computation of time-consuming and repeated operations, such as projection and backprojection. Since the size of […]
Apr, 15
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in […]
Apr, 15
A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction
Full resolution electron microscopic tomographic (EMT) reconstruction of large-scale tilt series requires significant computing power. The desire to perform multiple cycles of iterative reconstruction and realignment dramatically increases the pressing need to improve reconstruction performance. This has motivated us to develop a distributed multi-GPU (graphics processing unit) system to provide the required computing power for […]
Apr, 15
GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems
The details of the graphical processing unit (GPU) implementation of the most computationally intensive (T)-part of the recently introduced regularized CCSD(T) (Reg-CCSD(T)) method [ Kowalski, K. ; Valiev, M. J. Chem. Phys. 2009, 131 , 234107 ] for calculating electronic energies of strongly correlated systems are discussed. Parallel tests performed for several molecular systems show […]
Apr, 15
Implementation of Jacobi iterative method on graphics processor unit
CUDA is a new computing architecture introduced by NVIDIA Corporation, aiming at general purpose computation on GPU. The architecture has strong compute power in the compute-intensive applications and data-intensive applications, so in recent years, how the framework is applied to the scientific computing has become a hot research. The iterative method for solving systems of […]
Apr, 15
Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms
The challenging task of analyzing on-chip power (ground) distribution networks with multimillion node complexity and beyond is key to today’s large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT)-based graphics processing unit (GPU) platforms to tackle large-scale power grid analysis with promising performance. Several key enablers […]
Apr, 15
The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications
The usage and adoption of General Purpose GPUs (GPGPU) in HPC systems is increasing due to the unparalleled performance advantage of the GPUs and the ability to fulfill the ever-increasing demands for floating points operations. While the GPU can offload many of the application parallel computations, the system architecture of a GPU-CPU-InfiniBand server does require […]