5945

Posts

Oct, 11

Performance and Power Analysis of ATI GPU: A Statistical Approach

We present a comprehensive study on the performance and power consumption of a recent ATI GPU. By employing a rigorous statistical model to analyze execution behaviors of representative general-purpose GPU (GPGPU) applications, we conduct insightful investigations on the target GPU architecture. Our results demonstrate that the GPU execution throughput and the power dissipation are dependent […]
Oct, 11

Fast Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, often after a 3D segmentation of the interesting structures is performed. Traditional implementations of MC on modern CPUs are slow, using several seconds (even minutes) to […]
Oct, 11

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of […]
Oct, 11

PyPs, a programmable pass manager

As hardware platforms are growing in complexity, compiler infrastructures need more flexibility: due to the heterogeneity of these platforms, compiler phases must be combined in unusual and dynamic ways, and several tools may need to be combined to handle specific parts of the compilation process efficiently. The need for flexibility also appears in iterative compilation […]
Oct, 11

High Performance Parallel Design Based on Session Programming

Session programming is a programming model based on the theory of session types, a typing system for pi-calculus. Session types is developed to model structured interaction between processes and correctly typed process will have the property of communication safety. Session Java (SJ) is a full implementation of session types in Java. In this project, we […]
Oct, 11

Static Compilation Analysis for Host-Accelerator Communication Optimization

We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as […]
Oct, 11

Using the CPU to Improve Performance in 3D Applications

Many applications in the film and game industries require multiple calculations to be performed on vast data sets. Any of these tools that are required to run in real-time, and be used interactively, must be developed with performance in mind. The following paper aims to explain how the Central Processing Unit can be utilised effectively […]
Oct, 11

A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing

Discrete trigonometric transforms, such as the discrete cosine transform (DCT) and the discrete sine transform (DST), have been extensively used in signal processing for transform-based coding. The even type-II DCT, used in image and video coding, became specially popular to decorrelate the pixel data and minimize the spatial redundancy. Albeit this DCT tends to be […]
Oct, 10

Computitional intensive Tasks in Multimedia Signal Processing

Driven by the gaming industry and the great emphasis placed on the visual sense, graphics processing units (GPUs) have improved their performances in recent years, even outperforming the computational capacity of single core CPUs. In fact multi-core architectures are nowadays common for both CPUs and GPUs in order to exploit parallelism in computing. In this […]
Oct, 10

A GPU-Accelerated Parallel Preconditioner for the Solution of the Boltzmann Transport Equation for Semiconductors

The solution of large systems of linear equations is typically achieved by iterative methods. The rate of convergence of these methods can be substantially improved by the use of preconditioners, which can be either applied in a black-box fashion to the linear system, or exploit properties specific to the underlying problem for maximum efficiency. However, […]
Oct, 10

Anti-parallel Patterns in Fine-grain Data-parallel Programs

Parallel systems and parallel programming are becoming increasingly more important. The developer in want of raw speed can no longer expect sequential processors to become faster and needs to turn to parallel platforms and parallel programs to satisfy his need for speed. But writing a parallel program is difficult and writing one with a decent […]
Oct, 10

Benchmarks Based on Anti-Parallel Patterns for the Evaluation of GPUs

We put forward "anti-parallel patterns" to guide the parallel performance analysis process. Anti-parallel patterns or APPs are common parts of parallel programs that cause these programs to have less than ideal performance, where the ideal speedup equals the number of processors. We present benchmarks to model the behavior of APPs on parallel platforms. Each benchmark […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: