Posts
Nov, 21
Conflux: Embedding Massively Parallel Semantics in a High-Level Programming Language
As of late massively parallel devices have become mainstream and are widely used in research and industry. But even despite recent advances of the API, programming these devices has proven to be a difficult and error-prone task. We have designed Conflux, an embedded domain-specific language that integrates massively parallel semantics into a high-level programming language. […]
Nov, 21
Graph-based Parallel Analysis of Large Analog Circuits Based on GPU Platforms
In this paper, we propose a new parallel analysis method for large analog circuits using determinant decision diagram (DDD) based graph technique. DDD-based symbolic analysis technique enables exact symbolic analysis of vary large analog circuits. Once the circuit small-signal characteristics are presented by DDDs, evaluation of DDDs will give exact numerical values. In this paper, […]
Nov, 21
Challenge benchmarks that must be conquered to sustain the gpu revolution
The shift from GPUs to GPGPUs has brought with it many changes to the GPU architecture (e.g. more caches, more concurrent kernels, better synchronization). As GPUs press further into the general-purpose domain, architects must continue to address the performance of challenging workloads. This paper presents a set of challenge benchmarks and their key performance limitations […]
Nov, 21
PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures
Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore […]
Nov, 20
Efficient Stack-less BVH Traversal for Ray Tracing
We propose a new, completely iterative traversal algorithm for ray tracing bounding volume hierarchies that is based on storing a parent pointer with each node, and on using simple state logic to infer which node to traverse next. Though our traversal algorithm does re-visit internal nodes, it intersects each visited node only once, and in […]
Nov, 20
Implementing a Finite Difference-Based Real-time Sound Synthesizer using GPUs
In this paper, we describe an implementation of a real-time sound synthesizer using Finite Difference-based simulation of a two-dimensional membrane. Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. Many current […]
Nov, 20
Spatial interpolation in massively parallel computing environments
Prediction of environmental phenomena at non-observed locations is a fundamental task in geographic information science. Often, samples are taken at a limited number of sensor locations and spatial and spatio-temporal interpolation is used to generate continuous maps. The computational cost of the underlying algorithms usually grows with the number of data entering the interpolation and […]
Nov, 20
Soft Error Resilient QR Factorization for Hybrid System
As the general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing for its raw performance advantages compared to CPUs, the fault tolerance issue has started to become more of a concern than before when they were exclusively used for graphics applications. The pairing of GPUs with CPUs to form a hybrid computing […]
Nov, 20
Using mobile GPU for general-purpose computing – a case study of face recognition on smartphones
As GPU becomes an integrated component in handheld devices like smartphones, we have been investigating the opportunities and limitations of utilizing the ultra-low-power GPU in a mobile platform as a general-purpose accelerator, similar to its role in desktop and server platforms. The special focus of our investigation has been on mobile GPU’s role for energy-optimized […]
Nov, 20
Autotuning GEMMs for Fermi
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial […]
Nov, 20
Hierarchical QR factorization algorithms for multi-core cluster systems
This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of high-performance computing. Our new QR factorization algorithm falls in the category of the tile algorithms which naturally enables good data locality for the sequential […]
Nov, 20
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures
We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system […]