Posts
Nov, 20
Using mobile GPU for general-purpose computing – a case study of face recognition on smartphones
As GPU becomes an integrated component in handheld devices like smartphones, we have been investigating the opportunities and limitations of utilizing the ultra-low-power GPU in a mobile platform as a general-purpose accelerator, similar to its role in desktop and server platforms. The special focus of our investigation has been on mobile GPU’s role for energy-optimized […]
Nov, 20
Autotuning GEMMs for Fermi
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial […]
Nov, 20
Hierarchical QR factorization algorithms for multi-core cluster systems
This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of high-performance computing. Our new QR factorization algorithm falls in the category of the tile algorithms which naturally enables good data locality for the sequential […]
Nov, 20
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures
We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system […]
Nov, 20
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs
GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector product (SYMV) for dense linear algebra. Optimizing the SYMV kernel is important because it forms the basis of fundamental algorithms […]
Nov, 20
Parallelized Incomplete Poisson Preconditioner in Cloth Simulation
Efficient cloth simulation is an important problem for interactive applications that involve virtual humans, such as computer games. A common aspect of many methods that have been developed to simulate cloth is a linear system of equations, which is commonly solved using conjugate gradient or multi-grid approaches. In this paper, we introduce to the computer […]
Nov, 19
Using the High Productivity Language Chapel to Target GPGPU Architectures
It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we […]
Nov, 19
Anisotropic mesh coarsening and refinement on GPU architecture
Finite element and finite volume methods on unstructured meshes offer a powerful approach to solving partial differential equations in complex domains. It has diverse application in areas such as industrial and geophysical fluid dynamics, structural mechanics, and radiative transfer. A key strength of the approach is the unstructured meshes exibility in conforming to complex geometry […]
Nov, 19
Exploiting concurrent kernel execution on graphic processing units
Graphics processing units (GPUs) have been accepted as a powerful and viable coprocessor solution in high-performance computing domain. In order to maximize the benefit of GPUs for a multicore platform, a mechanism is needed for CPU threads in a parallel application to share this computing resource for efficient execution. NVIDIA’s Fermi architecture pioneers the feature […]
Nov, 19
Towards Faster Cloth Simulation: Examining the Preconditioned Conjugate Gradient
High quality cloth simulation is based on implicit methods. A variety of methods have been proposed to solve the linear systems of equations, with the conjugate gradient and multi-grid being the most commonly used. In this technical report we examine the preconditioned conjugate gradient method .More precisely, we analyze the quality of different preconditioning schemes […]
Nov, 19
Towards Efficient GPU Sharing on Multicore Processors
Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a […]
Nov, 19
ShoveRand: a model-driven framework to easily generate random numbers on GP-GPU
Stochastic simulations are often sensitive to the randomness source that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by using more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such devices […]