Posts
Aug, 19
Extending abstract GPU APIs to shared memory
Parallel programming is used extensively for general-purpose computations. However, performance of parallel APIs varies for a given problem and a given architecture. This gives rise to the need for having an abstract way to express the parallel problems. This poster presents a new approach through which programmers can access these APIs without having to focus […]
Aug, 19
A framework for lab-based real-time video analysis on distributed camera networks
In the field of video analytics for surveillance, the trend towards the use of multi-camera and high definition video is increasing. This poses significant technical challenges in terms of video transmission and real-time processing for surveillance analytics, such as people recognition and tracking. Currently, available solutions are typically proprietary commercial systems which are costly to […]
Aug, 19
A cluster for CS education in the manycore era
Traditional Beowulf clusters have been homogeneous platforms for distributed-memory MIMD parallelism. However, the shift to multicore architectures has made shared-memory MIMD parallelism increasingly important, and inexpensive manycore GPGPUs have revived SIMD parallelism. This paper presents a case study in designing and building a heterogeneous cluster as a learning platform for tera-scale distributed- and shared-memory MIMD […]
Aug, 19
Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study
This paper introduces an industry strength, multi-purpose, benchmark: Shamrock. Developed at the Atomic Weapons Establishment (AWE), Shamrock is a two dimensional (2D) structured hydrocode; one of its aims is to assess the impacts of a change in hardware, and (in conjunction with a larger HPC Benchmark Suite) to provide guidance in procurement of future systems. […]
Aug, 19
Real-time rendering and dynamic updating of 3-d volumetric data
A dense 3-d terrain model obtained using reconstruction methods from aerial images is represented in a probabilistic volumetric framework. The choice of probabilistic representation is to represent inherent ambiguity in reconstruction of surface from images. Such probabilistic representation handles the ambiguity very well but leads to expensive dense volumetric storage. The area coverage required for […]
Aug, 19
Caracal: dynamic translation of runtime environments for GPUs
Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. The rapid adoption of GPU computing has been greatly aided by the introduction of high-level programming environments such […]
Aug, 19
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems
SkePU is a C++ template library that provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. […]
Aug, 19
Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration
The development of standard processors changed in the last years moving from bigger, more complex, and faster cores to putting several more simple cores onto one chip. This changed also the way programs are written in order to leverage the processing power of multiple cores of the same processor. In the beginning, programmers had to […]
Aug, 18
SkePU: a multi-backend skeleton programming library for multi-GPU systems
We present SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU […]
Aug, 18
Energy-aware metrics for benchmarking heterogeneous systems
With the advent of heterogeneous computing systems consisting of multi-core CPUs and many-core GPUs, robust methods are needed to facilitate fair benchmark comparisons between different systems. In this paper we present a benchmarking methodology for measuring a number of performance metrics for heterogeneous systems. Methods for comparing performance and energy efficiency are included. Consideration is […]
Aug, 18
ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs
Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is […]
Aug, 18
Physical and graphical effects in OpenCL by example
There are strong indications that the future of interactive graphics involves a more flexible programming model than today’s OpenGL/Direct3D pipelines. That means that graphics developers will need a basic understanding of how to combine emerging parallel-programming techniques with the traditional interactive rendering pipeline. This course provides an introduction to parallel-programming architectures and environments for interactive […]