Posts
Oct, 3
Heterogeneous Computing with OpenCL
Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous […]
Oct, 3
An OpenCL Fast Fourier Transformation
This paper describes an implementation strategy in preparation for an implementation of an OpenCL FFT. The two most essential factors (memory bandwidth and locality) that are crucial to obtain high performance on a GPU for an FFT implementation are highlighted. Theoretical upper bounds for performance in terms of the locality factor are derived. An implementation […]
Oct, 3
Realtime Computation of a VST Audio Effect Plugin on the Graphics Processor
A plugin system for GPGPU real time audio effect calculation on the graphics processing unit of the computer system is presented. The prototype application is the rendering of mono audio material with head-related transfer functions (HRTFs) to create the impression of a sound source located in a certain direction relative to the listener’s head. The […]
Oct, 3
Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle
In this paper we present steps towards the automatic detection of vulnerable road users in video. Such a system can e.g. be used as an automatic blind spot camera for trucks. The aim of the system is to automatically warn the driver when the algorithm detects vulnerable road users in the camera images. Such an […]
Oct, 3
An Auto-tuning Solution to Data Streams Clustering in OpenCL
Due to its applicability to numerous types of data, including telephone records, web documents, and click streams, the data stream model has recently attracted attention. For analysis of such data, it is crucial to process the data in a single pass, or a small number of passes, using little memory. This paper provides an OpenCL […]
Oct, 2
A New Class of Parallel Scheduling Algorithms
The main issue discussed in this book is concerned with solving job scheduling problems in parallel calculating environments, such as multiprocessor computers, clusters or distributed calculation nodes in networks, by applying algorithms which use various parallelization technologies starting from multiple calculation threads (multithread technique) up to distributed calculation processes. Strongly sequential character of the scheduling […]
Oct, 2
Hardware/Software Co-design for Energy-Efficient Seismic Modeling
Reverse Time Migration (RTM) has become the standard for high-quality imaging in the seismic industry. RTM relies on PDE solutions using stencils that are 8th order or larger, which require large-scale HPC clusters to meet the computational demands. However, the rising power consumption of conventional cluster technology has prompted investigation of architectural alternatives that offer […]
Oct, 2
Power Management and Optimization
After many years of focusing on "faster" computers, people have started taking notice of the fact that the race for "speed" has had the unfortunate side effect of increasing the total power consumed, thereby increasing the total cost of ownership of these machines. The heat produced has required expensive cooling facilities. As a result, it […]
Oct, 2
Development of a Chemically Reacting Flow Solver on the Graphic Processing Units
The focus of the current research is to develop a numerical framework on the Graphic Processing Units (GPU) capable of modeling chemically reacting flow. The framework incorporates a high-order finite volume method coupled with an implicit solver for the chemical kinetics. Both the fluid solver and the kinetics solver are designed to take advantage of […]
Oct, 2
An Execution Model and Runtime For Heterogeneous Many-Core Systems
The emergence of heterogeneous and many-core architectures presents a unique opportunity to deliver order of magnitude performance increases to high performance applications by matching certain classes of algorithms to specifically tailored architectures. However, their ubiquitous adoption has been limited by a lack of programming models and management frameworks designed to reduce the high degree of […]
Oct, 2
Power-Efficient Accelerators for High-Performance Applications
Computers, regardless of their function, are always better if they can operate more quickly. The addition of computation resources allows for improved response times, greater functionality and more flexibility. The drawback with improving a computer’s performance, however, is that it often comes at the cost of power and energy consumption. For many platforms, this is […]
Oct, 2
PTask: Operating System Abstractions To Manage GPUs as Compute Devices
We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a data flow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness […]