Posts
Oct, 12
Color Correction Acceleration Using a Color Cube and OpenCL
The article deals with the problem of real time color correction on modern but not dedicated video hardware, suggesting a new implementation of fast algorithm for color transformation utilizing 3D look-up tables. We focus on highly parallel nature of the proposed method and employ the GPU to perform the color calculations side-byside. The paper is […]
Oct, 12
Evaluating performance and portability of OpenCL programs
Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices […]
Oct, 11
Real-Time Rigid Body Interactions
Rigid body simulations are useful in many areas, most notably video games and computer animation. However, the requirements for accuracy and performance vary greatly between applications. In this project we combine methods and techniques from different sources to implement a rigid body simulation. The simulation uses a particle representation to approximate objects with the intent […]
Oct, 11
Performance Characterization and Optimization of Atomic Operations on AMD GPUs
Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially on […]
Oct, 11
Performance and Power Analysis of ATI GPU: A Statistical Approach
We present a comprehensive study on the performance and power consumption of a recent ATI GPU. By employing a rigorous statistical model to analyze execution behaviors of representative general-purpose GPU (GPGPU) applications, we conduct insightful investigations on the target GPU architecture. Our results demonstrate that the GPU execution throughput and the power dissipation are dependent […]
Oct, 11
Fast Surface Extraction and Visualization of Medical Images using OpenCL and GPUs
Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, often after a 3D segmentation of the interesting structures is performed. Traditional implementations of MC on modern CPUs are slow, using several seconds (even minutes) to […]
Oct, 11
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of […]
Oct, 11
PyPs, a programmable pass manager
As hardware platforms are growing in complexity, compiler infrastructures need more flexibility: due to the heterogeneity of these platforms, compiler phases must be combined in unusual and dynamic ways, and several tools may need to be combined to handle specific parts of the compilation process efficiently. The need for flexibility also appears in iterative compilation […]
Oct, 11
High Performance Parallel Design Based on Session Programming
Session programming is a programming model based on the theory of session types, a typing system for pi-calculus. Session types is developed to model structured interaction between processes and correctly typed process will have the property of communication safety. Session Java (SJ) is a full implementation of session types in Java. In this project, we […]
Oct, 11
Static Compilation Analysis for Host-Accelerator Communication Optimization
We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as […]
Oct, 11
Using the CPU to Improve Performance in 3D Applications
Many applications in the film and game industries require multiple calculations to be performed on vast data sets. Any of these tools that are required to run in real-time, and be used interactively, must be developed with performance in mind. The following paper aims to explain how the Central Processing Unit can be utilised effectively […]
Oct, 11
A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing
Discrete trigonometric transforms, such as the discrete cosine transform (DCT) and the discrete sine transform (DST), have been extensively used in signal processing for transform-based coding. The even type-II DCT, used in image and video coding, became specially popular to decorrelate the pixel data and minimize the spatial redundancy. Albeit this DCT tends to be […]