Posts
Apr, 6
Improving GPU Performance Prediction with Data Transfer Modeling
Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution […]
Apr, 6
Real-Time Object-Space Edge Detection using OpenCL
At its most basic, object-space edge detection iterates through all polygonal edges in each mesh to find those edges that satisfy one or more edge tests. Those that do are expanded and rendered, while the remainder are ignored. These 3D edges, and their resulting accuracy and customizability, set objectspace methods apart from all other categories […]
Apr, 6
Parallel Implementation of Dynamic Programming Algorithm Using Graphics Processing Unit
In this research implementation of a dynamic programming algorithm (Viterbi) has been done on graphics processing unit of NVidia using CUDA model. As graphical processing units are becoming important in supporting central processing units for the acceleration of complex floating point calculations. The complex computation goes on parallel in graphics processing unit as it contains […]
Apr, 4
Adapting Particle Filter Algorithms to Many-Core Architectures
The particle filter is a Bayesian estimation technique based on Monte Carlo simulation. It is ideal for non-linear, nonGaussian dynamical systems with applications in many areas, such as computer vision, robotics, and econometrics. Practical use has so far been limited, because of steep computational requirements. In this study, we investigate how to design a particle […]
Apr, 4
Deploying Graph Algorithms on GPUs: an Adaptive Solution
Thanks to their massive computational power and their SIMT computational model, Graphics Processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can […]
Apr, 4
Optimising Purely Functional GPU Programs
Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. It this […]
Apr, 4
Real-time Stereo Vision: Optimizing Semi-Global Matching
Semi-Global Matching (SGM) is arguably one of the most popular algorithms for real-time stereo vision. It is already employed in mass production vehicles today. Thinking of applications in intelligent vehicles (and fully autonomous vehicles in the long term), we aim at further improving SGM regarding its accuracy. In this study, we propose a straight-forward extension […]
Apr, 4
C Language Extensions for Hybrid CPU/GPU Programming with StarPU
Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library that addresses this problem by providing users with ways to define "tasks" to be executed on CPUs or GPUs, along with the dependencies among them, and […]
Apr, 3
GPU Accelerated Automated Feature Extraction from Satellite Images
The availability of large volumes of remote sensing data insists on higher degree of automation in feature extraction, making it a need of the hour. Fusing data from multiple sources, such as panchromatic, hyper spectral and LiDAR sensors, enhances the probability of identifying and extracting features such as buildings, vegetation or bodies of water by […]
Apr, 3
Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures
The clock speed of current CPUs and RAM has stopped scaling with Moore’s Law. Yet the scale of applications in science and engineering continues to increase. In order to address this scaling of applications, newer NUMA architectures are emerging. These include parallel disks, hybrid CPU-GPU, and many-core CPUs. Existing CPU-based algorithms, as well as legacy […]
Apr, 3
Astrophysical data mining with GPU. A case study: genetic classification of globular clusters
We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of […]
Apr, 3
The Stencil Processing Unit: GPGPU Done Right
As computing moves to exascale, it will be dominated by energy-efficiency. We propose a new GPU-like accelerator called the Stencil Processing Unit (SPU), for implementing dense stencil computations in an energy-efficient manner. We address all the levels of the programming stack, from architecture, programming API, runtime system and compilation. First, a simple architectural innovation to […]