1762

Posts

Nov, 23

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture, there are usually halo regions that need to be updated and exchanged among different processing elements (PEs). In addition, synchronization is often used to signal the completion of […]
Nov, 23

Many-core algorithms for statistical phylogenetics

MOTIVATION: Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon […]
Nov, 23

Synergistic execution of stream programs on multicores with accelerators

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method […]
Nov, 23

Graphical Asian Options

We discuss the problem of pricing Asian options in Black-Scholes model using CUDA on a graphics processing unit. We survey some of the issues with GPU programming and discuss code design and memory usage. We show that by using a Quasi Monte Carlo simulation with a geometric Asian option as a control variate, it is […]
Nov, 23

Challenges and opportunities of obtaining performance from multi-core CPUs and many-core GPUs

Multi-core processors represent a major development in computing technology. For example, Intel Coretrade 2 Quad processors, IBM Cell processors, and Nvidia GeForce 9800 GX2, are widely used. However, most applications struggle to make the best use of the power provided by many-core processors. Easy-to-use software tools are hard to find. Furthermore, it’s not clear what […]
Nov, 23

Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA

In this article a very efficient implementation of a 2D-Lattice Boltzmann kernel using the Compute Unified Device Architecture (CUDA) interface developed by nVIDIA is presented. By exploiting the explicit parallelism exposed in the graphics hardware we obtain more than one order in performance gain compared to standard CPUs. A non-trivial example, the flow through a […]
Nov, 23

Efficient Probabilistic Model Checking on General Purpose Graphics Processors

We present algorithms for parallel probabilistic model checking on general purpose graphic processing units (GPGPUs). For this purpose we exploit the fact that some of the basic algorithms for probabilistic model checking rely on matrix vector multiplication. Since this kind of linear algebraic operations are implemented very efficiently on GPGPUs, the new parallel algorithms can […]
Nov, 23

Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images

Due to processing constraints, automatic image-based registration of medical images has been largely used as a pre-operative tool. We propose a novel method named sort and count for efficient parallelization of mutual information (MI) computation designed for massively multi-processing architectures. Combined with a parallel transformation implementation and an improved optimization algorithm, our method achieves real-time […]
Nov, 23

Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures

Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First, we describe the three major categories of volume rendering algorithms and confirm through an imaging scientist-guided evaluation that ray-casting is the most acceptable. We describe a thread- and […]
Nov, 23

Fast perspective volume ray casting method using GPU-based acceleration techniques for translucency rendering in 3D endoluminal CT colonography

Recent advances in graphics processing unit (GPU) have enabled direct volume rendering at interactive rates. However, although perspective volume rendering for opaque isosurface is rapidly performed using conventional GPU-based method, perspective volume rendering for non-opaque volume such as translucency rendering is still slow. In this paper, we propose an efficient GPU-based acceleration technique of fast […]
Nov, 23

hiCUDA: a high-level directive-based language for GPU programming

The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host memory and various components of the GPU memory, and of manually optimizing the utilization of the […]
Nov, 23

Acceleration of a QM/MM-QMC simulation using GPU

We accelerated an ab-initio molecular QMC calculation by using GPGPU. Only the bottle-neck part of the calculation is replaced by CUDA subroutine and performed on GPU, getting 23.5 (9.7) times faster performance in single (double) precision. The energy deviation caused by the single precision treatment was found to be within the accuracy required in the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: