Posts
Dec, 14
A Flexible Kernel for Adaptive Mesh Refinement on GPU
We present a flexible GPU kernel for adaptive on-the-fly refinement of meshes with arbitrary topology. By simply reserving a small amount of GPU memory to store a set of adaptive refinement patterns, on-the-fly refinement is performed by the GPU, without any preprocessing nor additional topology data structure. The level of adaptive refinement can be controlled […]
Dec, 14
A GPU based real-time GPS software receiver
Off-the-shelf graphics processing units provide low-cost massive parallel computing performance, which can be utilized for the implementation of a GPS software receiver. In order to realize a real-time capable system the crucial stages of the receiver should be optimized to suit the requirements of a parallel processor. Moreover, the receiver should be capable to provide […]
Dec, 14
Scalable Programming Models for Massively Multicore Processors
Including multiple cores on a single chip has become the dominant mechanism for scaling processor performance. Exponential growth in the number of cores on a single processor is expected to lead in a short time to mainstream computers with hundreds of cores. Scalable implementations of parallel algorithms will be necessary in order to achieve improved […]
Dec, 14
Specular Effects on the GPU: State of the Art
This survey reviews algorithms that can render specular, i.e. mirror reflections, refractions, and caustics on the GPU. We establish a taxonomy of methods based on the three main different ways of representing the scene and computing ray intersections with the aid of the GPU, including ray tracing in the original geometry, ray tracing in the […]
Dec, 14
OpenMP in Multicore Architectures (tech. report)
OpenMP is an API (application program interface) used to explicitly direct multi-threaded, shared memory parallelism. With the advent of Multi-core processors, there has been renewed interest in parallelizing programs. Multi-core offers support to execute threads in parallel but at the same time the cost of communication is very less. This opens up new domains to […]
Dec, 14
OpenMP on Multicore Architectures
Dualcore processors are already ubiquitous, quadcore processors will spread out this year, systems with a larger number of cores exist, and more are planned. Some cores even execute multiple threads. Are these processors just SMP systems on a chip? Is OpenMP ready to be used for these architectures? We take a look at the cache […]
Dec, 14
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations — a class of algorithms at the heart of many structured grid codes, including PDF solvers. We develop a […]
Dec, 14
High-performance and Hardware-aware Computing: Proceedings of the First International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC’08)
The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach.
Dec, 13
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units
Basic uniform pseudo-random number generators are implemented on ATI Graphics Processing Units (GPU). The performance results of the realized generators (multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR, RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The obtained speed up factor is hundreds of times in comparison with CPU. RANLUX generator is […]
Dec, 13
Embracing Heterogeneity: Parallel Programming for Changing Hardware
Computer systems are undergoing significant change: to improve performance and efficiency, architects are exposing more microarchitectural details directly to programmers. Software that exploits specialized accelerators, such as GPUs, and specialized processor features, such as software-controlled memory, exposes limitations in existing compiler and OS infrastructure. In this paper we propose a pragmatic approach, motivated by our […]
Dec, 13
Parallel N-Body Simulation using GPUs
We present a novel parallel implementation of N-body gravitational simulation. Our algorithm uses graphics hardware to accelerate local computation, and is optimized to account for low bandwidth between the CPU and the graphics card, as well as low bandwidth across the network. The number of bodies that can be simulated with our implementation is limited […]
Dec, 13
Prototyping flexible touch screen devices using collocated haptic-graphic elastic-object deformation on the GPU
Rapid advances in flexible display technologies and the benefits that they provide are promising enough to consider them for futuristic mobile devices. Current prototyping methods lack facilities to simulate such flexible touch screen displays and the interaction with them. In this paper, we present a technique that provides product developers a tool to interactively simulate […]