Posts
Feb, 27
An improved implementation of Preconditioned Conjugate Gradient Method on GPU
An improved implementation of the Preconditioned Conjugate Gradient method on GPU using CUDA (Compute Unified Device Architecture) is proposed. It aims to solving the Poisson equation arising in liquid animation with high efficiency. We consider the features of the linear system obtained from the Poisson equation and propose an optimization method to solve it. First, […]
Feb, 27
Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs
Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to largescale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. […]
Feb, 27
Simultaneous floating-point sine and cosine for VLIW integer processors
Graphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. […]
Feb, 26
Cooperative Heterogeneous Computing for Parallel Processing on CPU/GPU Hybrids
This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at runtime the coarse-grain threadlevel parallelism across CPU and GPU, without any source recompilation. To this end, three features […]
Feb, 26
Fast Multipole Methods and High Performance Computing
This thesis details my research in two primary fields: fast multipole methods (FMM) and high performance computing with GPUs. Although these are two seemly disparate courses of study, significant results in implementing the FMM efficiently on a GPU are presented in Chapter 4. In first chapter, we introduce these two fields of study in a […]
Feb, 26
Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems
Heterogeneous systems that contain multiple types of resources, such as CPUs and GPUs, are becoming increasingly popular thanks to the potential of achieving high performance and energy efficiency. In such systems, the problem of data mapping and communication for time-sensitive applications while reducing power and energy consumption is more challenging, since applications may have varied […]
Feb, 26
Vortex particle method and parallel computing
In this paper, it was presented numerical results related to three dimensional simulation of motion of a vortex ring. For the simulation it was chosen the Vortex In Cell method. The method was shortly described in the paper. The numerical results were obtained on the single processor (x86) architecture. The disadvantage of the single processor […]
Feb, 26
GPU Accelerated Molecular Surface Computing
A method is presented for computing the SES (solvent excluded surface) of a protein molecule in interactive-time based on GPU (graphics processing unit) acceleration. First, the offset surface of the van der Waals spheres is sampled using an offset distance d that corresponds to the radius of the solvent probe. The SES is then constructed […]
Feb, 24
A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
Adaptive bitonic sort is a well known merge-based parallel sorting algorithm. It achieves optimal complexity using a complex tree-like data structure called a bitonic tree. Due to this, using adaptive bitonic sort together with other algorithms usually implies converting bitonic trees to arrays and vice versa. This makes adaptive bitonic sort inappropriate in the context […]
Feb, 24
Reuse and Refactoring of GPU Kernels to Design Complex Applications
Developers of GPU kernels, such as FFT, linear solvers, etc, tune their code extensively in order to obtain optimal performance, making efficient use of different resources available on the GPU. Complex applications are composed of several such kernel components. The software engineering community has performed extensive research on componentbased design to build generic and flexible […]
Feb, 24
Stargazer: Automated Regression-Based GPU Design Space Exploration
Graphics processing units (GPUs) are of increasing interest because they offer massive parallelism for high-throughput computing. While GPUs promise high peak performance, their challenge is a less-familiar programming model with more complex and irregular performance trade-offs than traditional CPUs or CMPs. In particular, modest changes in software or hardware characteristics can lead to large or […]
Feb, 24
Collision Detection of Triangle Meshes using GPU
Collision detection in physics engines often use primitives such as spheres and boxes since collisions between these objects are straightforward to compute. More complicated objects can then be modeled using compounds of these simpler primitives. However, in the pursuit of making it easier to construct and simulate complicated objects, triangle meshes are a good alternative […]