Posts
Feb, 27
CUDA-enabled LBM Flow Simulation around Three Equilateral Cylinders using GPU Computing Processor
This study is concerned with the simulation of viscous flow past three equal diameter circular cylinders in equilateral-triangular arrangement. The hydrodynamic characteristics of cylinders are modelled by a 2Dlattice Boltzmann kernel which is constructed employing Compute Unified Device Architecture (CUDA) interface developed by nVIDIA. Computations using the developed kernel are performed for nine spacing ratios […]
Feb, 27
An Energy Consumption Model for GPU Computing at Instruction Level
With the development of hardware and software, GPU has been used in General-Purpose computation field. The high density of computing resource on chip bring in high performance as well as high power consumption. So the power consumption of GPU has increasingly become one of the most important issue for the development of general computing with […]
Feb, 27
An improved implementation of Preconditioned Conjugate Gradient Method on GPU
An improved implementation of the Preconditioned Conjugate Gradient method on GPU using CUDA (Compute Unified Device Architecture) is proposed. It aims to solving the Poisson equation arising in liquid animation with high efficiency. We consider the features of the linear system obtained from the Poisson equation and propose an optimization method to solve it. First, […]
Feb, 27
Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs
Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to largescale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. […]
Feb, 27
Simultaneous floating-point sine and cosine for VLIW integer processors
Graphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. […]
Feb, 26
Cooperative Heterogeneous Computing for Parallel Processing on CPU/GPU Hybrids
This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at runtime the coarse-grain threadlevel parallelism across CPU and GPU, without any source recompilation. To this end, three features […]
Feb, 26
Fast Multipole Methods and High Performance Computing
This thesis details my research in two primary fields: fast multipole methods (FMM) and high performance computing with GPUs. Although these are two seemly disparate courses of study, significant results in implementing the FMM efficiently on a GPU are presented in Chapter 4. In first chapter, we introduce these two fields of study in a […]
Feb, 26
Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems
Heterogeneous systems that contain multiple types of resources, such as CPUs and GPUs, are becoming increasingly popular thanks to the potential of achieving high performance and energy efficiency. In such systems, the problem of data mapping and communication for time-sensitive applications while reducing power and energy consumption is more challenging, since applications may have varied […]
Feb, 26
Vortex particle method and parallel computing
In this paper, it was presented numerical results related to three dimensional simulation of motion of a vortex ring. For the simulation it was chosen the Vortex In Cell method. The method was shortly described in the paper. The numerical results were obtained on the single processor (x86) architecture. The disadvantage of the single processor […]
Feb, 26
GPU Accelerated Molecular Surface Computing
A method is presented for computing the SES (solvent excluded surface) of a protein molecule in interactive-time based on GPU (graphics processing unit) acceleration. First, the offset surface of the van der Waals spheres is sampled using an offset distance d that corresponds to the radius of the solvent probe. The SES is then constructed […]
Feb, 24
A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
Adaptive bitonic sort is a well known merge-based parallel sorting algorithm. It achieves optimal complexity using a complex tree-like data structure called a bitonic tree. Due to this, using adaptive bitonic sort together with other algorithms usually implies converting bitonic trees to arrays and vice versa. This makes adaptive bitonic sort inappropriate in the context […]
Feb, 24
Reuse and Refactoring of GPU Kernels to Design Complex Applications
Developers of GPU kernels, such as FFT, linear solvers, etc, tune their code extensively in order to obtain optimal performance, making efficient use of different resources available on the GPU. Complex applications are composed of several such kernel components. The software engineering community has performed extensive research on componentbased design to build generic and flexible […]

