6412

Posts

Nov, 20

Efficient Stack-less BVH Traversal for Ray Tracing

We propose a new, completely iterative traversal algorithm for ray tracing bounding volume hierarchies that is based on storing a parent pointer with each node, and on using simple state logic to infer which node to traverse next. Though our traversal algorithm does re-visit internal nodes, it intersects each visited node only once, and in […]
Nov, 20

Implementing a Finite Difference-Based Real-time Sound Synthesizer using GPUs

In this paper, we describe an implementation of a real-time sound synthesizer using Finite Difference-based simulation of a two-dimensional membrane. Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. Many current […]
Nov, 20

Spatial interpolation in massively parallel computing environments

Prediction of environmental phenomena at non-observed locations is a fundamental task in geographic information science. Often, samples are taken at a limited number of sensor locations and spatial and spatio-temporal interpolation is used to generate continuous maps. The computational cost of the underlying algorithms usually grows with the number of data entering the interpolation and […]
Nov, 20

Soft Error Resilient QR Factorization for Hybrid System

As the general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing for its raw performance advantages compared to CPUs, the fault tolerance issue has started to become more of a concern than before when they were exclusively used for graphics applications. The pairing of GPUs with CPUs to form a hybrid computing […]
Nov, 20

Using mobile GPU for general-purpose computing – a case study of face recognition on smartphones

As GPU becomes an integrated component in handheld devices like smartphones, we have been investigating the opportunities and limitations of utilizing the ultra-low-power GPU in a mobile platform as a general-purpose accelerator, similar to its role in desktop and server platforms. The special focus of our investigation has been on mobile GPU’s role for energy-optimized […]
Nov, 20

Autotuning GEMMs for Fermi

In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial […]
Nov, 20

Hierarchical QR factorization algorithms for multi-core cluster systems

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of high-performance computing. Our new QR factorization algorithm falls in the category of the tile algorithms which naturally enables good data locality for the sequential […]
Nov, 20

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system […]
Nov, 20

Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs

GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector product (SYMV) for dense linear algebra. Optimizing the SYMV kernel is important because it forms the basis of fundamental algorithms […]
Nov, 20

Parallelized Incomplete Poisson Preconditioner in Cloth Simulation

Efficient cloth simulation is an important problem for interactive applications that involve virtual humans, such as computer games. A common aspect of many methods that have been developed to simulate cloth is a linear system of equations, which is commonly solved using conjugate gradient or multi-grid approaches. In this paper, we introduce to the computer […]
Nov, 19

Using the High Productivity Language Chapel to Target GPGPU Architectures

It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we […]
Nov, 19

Anisotropic mesh coarsening and refinement on GPU architecture

Finite element and finite volume methods on unstructured meshes offer a powerful approach to solving partial differential equations in complex domains. It has diverse application in areas such as industrial and geophysical fluid dynamics, structural mechanics, and radiative transfer. A key strength of the approach is the unstructured meshes exibility in conforming to complex geometry […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: