high performance computing on graphics processing units: hgpu.org

Posts

Sep, 24

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor

Energy efficiency is one of the most important metrics in embedded processor design. The use of wide SIMD architecture is a promising approach to build energyefficient high performance embedded processors. In this paper, we propose a design framework for a configurable wide SIMD architecture that utilizes an explicit datapath to achieve high energy efficiency. The […]

OpenCL

Sep, 24

OpenCL Based Digital Image Projection Acceleration

In this thesis, several implementations of an image back projection algorithm using Open Computing Language (OpenCL) for different types of processors are developed. Image back projection is a method to take aerial imagery and create a map-like image that contains real-world dimensions and to remove the perspective angle from the camera. The processors that ran […]

OpenCL

Sep, 23

International Conference on Robotics, Mechanics and Mechatronics (ICRMM 2016), 2016

Topics: Robotics and Mechanical Engineering Actuator design, robotic mechanisms and design, robot kinematics and dynamics Agile Manufacturing Agriculture, construction, industrial automation, manufacturing process Automation and control systems, middleware Biomedical and rehabilitation engineering, welfare robotics and mechatronics Cellular Manufacturing Concurrent Engineering Design for Manufacture and Assembly Distributed Control Systems Flexible Manufacturing Systems FMS Artificial Intelligence Humanoid […]

Sep, 23

International Conference on Frontiers of Sensors Technologies (ICFST 2016), 2016

Publication: *International Journal of Materials, Mechanics and Manufacturing (ISSN: 1793-8198) Abstracting/Indexing: EI (INSPEC, IET), Chemical Abstracts Services (CAS), Engineering & Technology Digital Library, ProQuest, Crossref, Ulrich’s Periodicals Directory, DOAJ, and Electronic Journals Library. Topics: Biosensors Immunosensors Fiber Optic Sensors Optical Sensors Optical Biosensors Mechanical Sensors Magnetic Sensors Imaging Sensors Physical Sensors Physical Biosensors Cell-based Biosensors […]

Sep, 19

Automatic OpenCL code generation for multi-device heterogeneous architectures

Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially nonuniform domain […]

OpenCL

Sep, 19

Automatic Online Tuning (AutoTune): Fully Extended Analysis

The AutoTune project develops the Periscope Tuning Framework (PTF) including several plugins targeting performance improvements as well as to reduce energy consumption of applications. One of the main advantages of PTF over other tuning frameworks is its capability to combine tuning and analysis strategies to simplify and speed up the tuning process. To support the […]

OpenCL

Sep, 19

Parallel Decompression of Seismic Data on GPU Using a Lifting Wavelet Algorithm

Subsurface images are widely used by the oil companies to find oil reservoirs. The construction of these images involves to collect and process a huge amount of seismic data. Generally, the oil companies use compression algorithms to reduce the storage and transmission costs. Currently, the compression process is developed on-site using CPU architectures, whereas the […]

CUDA

Sep, 19

Autotuning Wavefront Patterns for Heterogeneous Architectures

Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious […]

OpenCL

Sep, 19

An OpenCL design of the Bob Jenkins lookup3 hash function using the Xilinx SDAccel Development Environment

In this report, we present an OpenCL-based design of a hashing function which forms a core component of memcached [1], a distributed in-memory key-value store caching layer widely used to reduce access load between web servers and databases. Our work has been inspired by recent research investigations on dataflow architectures for key-value stores that can […]

OpenCL

Sep, 17

Efficient Kernel Fusion Techniques for Massive Video Data Analysis on GPGPUs

Kernels are executable code segments and kernel fusion is a technique for combing the segments in a coherent manner to improve execution time. For the first time, we have developed a technique to fuse image processing kernels to be executed on GPGPUs for improving execution time and total throughput (amount of data processed in unit […]

CUDA

Sep, 17

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. This work distribution can be a poor solution as it […]

OpenCL

Sep, 17

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluates and tunes kernel performance of a generic, user-defined search space of possible parametervalue combinations. Example parameters include the OpenCL workgroup size, vector data-types, tile sizes, and loop unrolling factors. CLTune can be used in the following scenarios: 1) when there are too many tunable […]

OpenCL