Posts
May, 26
PROJECTION Algorithm for Motif Finding on GPUs
Motif finding is one of the NP-complete problems in Computational Biology. Existing nondeterministic algorithms for motif finding do not guarantee the global optimality of results and are sensitive to initial parameters. To address this problem, the PROJECTION algorithm provides a good initial estimate that can be further refined using local optimization algorithms such as EM, […]
May, 23
Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and […]
May, 23
ImageCL: An Image Processing Language for Performance Portability on Heterogeneous Systems
Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these systems suffer from poor performance portability, code tuned for one device must be retuned to achieve high performance on another. Image processing is increasing in importance, with applications ranging from seismology and medicine to Photoshop. Based […]
May, 23
Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups
We propose a new method for training computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. Our sparse connection structure facilitates a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy. We validate our approach by using it […]
May, 23
Graphics Supercomputing Applied to Brain Image Analysis with NiftyReg
Medical image processing in general and brain image processing in particular are computationally intensive tasks. Luckily, their use can be liberalized by means of techniques such as GPU programming. In this article we study NiftyReg, a brain image processing library with a GPU implementation using CUDA, and analyse different possible ways of further optimising the […]
May, 23
A Practical Performance Model for Compute and Memory Bound GPU Kernels
Performance prediction of GPU kernels is generally a tedious procedure with unpredictable results. In this paper, we provide a practical model for estimating performance of CUDA kernels on GPU hardware in an automated manner. First, we propose the quadrant-split model, an alternative of the roofline visual performance model, which provides insight on the performance limiting […]
May, 21
The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development
One of the benefits to programming of OpenCL is platform portability. That is, an OpenCL program that follows the OpenCL specification should, in principle, execute reliably on any platform that supports OpenCL. To assess the current state of OpenCL portability, we provide an experience report examining two sets of open source benchmarks that we attempted […]
May, 21
Architecture-Adaptive Code Variant Tuning
Code variants represent alternative implementations of a computation, and are common in high-performance libraries and applications to facilitate selecting the most appropriate implementation for a specific execution context (target architecture and input dataset). Automating code variant selection typically relies on machine learning to construct a model during an offline learning phase that can be quickly […]
May, 21
GPU-based Pedestrian Detection for Autonomous Driving
Pedestrian detection has gained a lot of prominence during the last few years. Besides the fact that it is one of the hardest tasks within computer vision, it involves huge computational costs. Obtaining acceptable real-time performance, measured in frames per second (fps), for the most advanced algorithms is nowadays a hard challenge. In this work, […]
May, 21
Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA
OBJECTIVE: Sorting is considered a very important application in many areas of computer science. Nowadays parallelization of sorting algorithms using GPU computing, on CUDA hardware is increasing rapidly. The objective behind using GPU computing is that the users can get, the more speedup of the algorithms. METHODS: In this paper, we have focused on count […]
May, 21
Employing Directive Based Compression Solutions on Accelerators Global Memory under OpenACC
Programmers invest extensive development effort to optimize a GPU program to achieve peak performance. Achieving this requires an efficient usage of global memory, and avoiding memory bandwidth underutilization. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this models coarse-grained control on a program can make the memory bandwidth utilization […]
May, 17
GPU-Accelerated Feature Tracking
The motivation of this research is to prove that GPUs can provide significant speedup of long-executing image processing algorithms by way of parallelization and massive data throughput. This thesis accelerates the well-known KLT feature tracking algorithm using OpenCL and an NVidia GeForce GTX 780 GPU. KLT is a fast, efficient and accurate feature tracker but […]

