Posts
Sep, 24
A Parallel Framework for Parametric Maximum Flow Problems in Image Segmentation
This paper presents a framework that supports the implementation of parallel solutions for the widespread parametric maximum flow computational routines used in image segmentation algorithms. The framework is based on supergraphs, a special construction combining several image graphs into a larger one, and works on various architectures (multi-core or GPU), either locally or remotely in […]
Sep, 24
Adaptive and Transparent Cache Bypassing for GPUs
In the last decade, GPUs have emerged to be widely adopted for general-purpose applications. To capture on-chip locality for these applications, modern GPUs have integrated multilevel cache hierarchy, in an attempt to reduce the amount and latency of the massive and sometimes irregular memory accesses. However, inferior performance is frequently attained due to serious congestion […]
Sep, 24
Overcomplete Dictionary Learning with Jacobi Atom Updates
Dictionary learning for sparse representations is traditionally approached with sequential atom updates, in which an optimized atom is used immediately for the optimization of the next atoms. We propose instead a Jacobi version, in which groups of atoms are updated independently, in parallel. Extensive numerical evidence for sparse image representation shows that the parallel algorithms, […]
Sep, 24
A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor
Energy efficiency is one of the most important metrics in embedded processor design. The use of wide SIMD architecture is a promising approach to build energyefficient high performance embedded processors. In this paper, we propose a design framework for a configurable wide SIMD architecture that utilizes an explicit datapath to achieve high energy efficiency. The […]
Sep, 24
OpenCL Based Digital Image Projection Acceleration
In this thesis, several implementations of an image back projection algorithm using Open Computing Language (OpenCL) for different types of processors are developed. Image back projection is a method to take aerial imagery and create a map-like image that contains real-world dimensions and to remove the perspective angle from the camera. The processors that ran […]
Sep, 23
International Conference on Robotics, Mechanics and Mechatronics (ICRMM 2016), 2016
Topics: Robotics and Mechanical Engineering Actuator design, robotic mechanisms and design, robot kinematics and dynamics Agile Manufacturing Agriculture, construction, industrial automation, manufacturing process Automation and control systems, middleware Biomedical and rehabilitation engineering, welfare robotics and mechatronics Cellular Manufacturing Concurrent Engineering Design for Manufacture and Assembly Distributed Control Systems Flexible Manufacturing Systems FMS Artificial Intelligence Humanoid […]
Sep, 23
International Conference on Frontiers of Sensors Technologies (ICFST 2016), 2016
Publication: *International Journal of Materials, Mechanics and Manufacturing (ISSN: 1793-8198) Abstracting/Indexing: EI (INSPEC, IET), Chemical Abstracts Services (CAS), Engineering & Technology Digital Library, ProQuest, Crossref, Ulrich’s Periodicals Directory, DOAJ, and Electronic Journals Library. Topics: Biosensors Immunosensors Fiber Optic Sensors Optical Sensors Optical Biosensors Mechanical Sensors Magnetic Sensors Imaging Sensors Physical Sensors Physical Biosensors Cell-based Biosensors […]
Sep, 19
Autotuning Wavefront Patterns for Heterogeneous Architectures
Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious […]
Sep, 19
Automatic OpenCL code generation for multi-device heterogeneous architectures
Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially nonuniform domain […]
Sep, 19
Automatic Online Tuning (AutoTune): Fully Extended Analysis
The AutoTune project develops the Periscope Tuning Framework (PTF) including several plugins targeting performance improvements as well as to reduce energy consumption of applications. One of the main advantages of PTF over other tuning frameworks is its capability to combine tuning and analysis strategies to simplify and speed up the tuning process. To support the […]
Sep, 19
Parallel Decompression of Seismic Data on GPU Using a Lifting Wavelet Algorithm
Subsurface images are widely used by the oil companies to find oil reservoirs. The construction of these images involves to collect and process a huge amount of seismic data. Generally, the oil companies use compression algorithms to reduce the storage and transmission costs. Currently, the compression process is developed on-site using CPU architectures, whereas the […]
Sep, 19
An OpenCL design of the Bob Jenkins lookup3 hash function using the Xilinx SDAccel Development Environment
In this report, we present an OpenCL-based design of a hashing function which forms a core component of memcached [1], a distributed in-memory key-value store caching layer widely used to reduce access load between web servers and databases. Our work has been inspired by recent research investigations on dataflow architectures for key-value stores that can […]