5669

Posts

Sep, 23

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

The technology community is rapidly moving away from the age of computers and laptops, and is entering the emerging era of hand-held devices. With the rapid development of smart phones, tablets, and pads, there has been widespread adoption of Graphic Processing Units (GPUs) in the embedded space. The hand-held market is now seeing an ever […]
Sep, 23

Embedding OpenCL in C++ for Expressive GPU Programming

We present a high performance GPU programming language, based on OpenCL, that is embedded in C++. Our embedding provides shared data structures, typesafe kernel invocation, and the ability to more naturally interleave CPU and GPU functions, similar to CUDA but with the portability of OpenCL. For expressivity, our language provides an abstraction that releases control […]
Sep, 8

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as […]
Sep, 8

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

The growth in multicore CPUs and the emergence of powerful manycore GPUs has led to proliferation of parallel applications. Many applications are not straight forward to be parallelized. This paper examines the performance of a parallelized implementation for calculating measurements of Complex Networks. We present an algorithm for calculating complex networks topological feature clustering coefficient, […]
Aug, 18

ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs

Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is […]
Aug, 18

Physical and graphical effects in OpenCL by example

There are strong indications that the future of interactive graphics involves a more flexible programming model than today’s OpenGL/Direct3D pipelines. That means that graphics developers will need a basic understanding of how to combine emerging parallel-programming techniques with the traditional interactive rendering pipeline. This course provides an introduction to parallel-programming architectures and environments for interactive […]
Aug, 18

Parallelization of the x264 encoder using OpenCL

With the introduction of H.264, the complexity on video encoders has increased dramatically. As hardware based encoding solutions profit from the strict sequential design and already feature real time capabilities for high definition material, software solutions lack most of the encoding performance. More precisely, the performance of software encoders is limited due to the computation […]
Aug, 18

Simulating Biological-Inspired Spiking Neural Networks with OpenCL

The algorithms used for simulating biologically-inspired spiking neural networks (BIANN) often utilize functions which are computationally complex and have to model a large number of neurons – or even a much larger number of synapses in parallel. To use all available computing resources provided by a standard desktop PC is an opportunity to shorten the […]
Aug, 18

Parallel Batch Training of the Self-Organizing Map Using OpenCL

The Self-Organizing Maps (SOMs) are popular artificial neural networks that are often used for data analyses through clustering and visualisation. SOM’s mathematical model is inherently parallel. However, many implementations have not successfully exploited its parallelism because previous attempts often required cluster-like infrastructures. This article presents the parallel implementation of SOMs, particularly the batch map variant […]
Aug, 18

Maestro: Data Orchestration and Tuning for OpenCL Devices

As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls […]
Aug, 18

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, […]
Aug, 18

Analyzing program flow within a many-kernel OpenCL application

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify application bottlenecks. In this paper, we analyze and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: