Posts
Apr, 7
Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpMM operation into three, […]
Apr, 7
A new CUDA-based GPU implementation of the two-dimensional Athena code
We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central […]
Apr, 6
23rd Annual International Conference on Computer Science and Software Engineering, CASCON 2013
CASCON 2013 is the 23rd annual international conference hosted by CAS Research, IBM Canada Software Lab. Using the motto, “Innovation that matters”, this conference provides an exciting forum for exchanging ideas and experience in the ever-expanding and critical fields of software engineering and computing. The theme of this year, “Ecosystem of Engagement”, highlights the confluence […]
Apr, 6
OpenCL C++
With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these models are low-level, even when considering them as systems programming models. For example, OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. Computer […]
Apr, 6
A Performance Study of Zero Crossing Rate (ZCR) on Graphics Processors (GPUs) Using CUDA
The Ability to harness the power of the Graphics Processor Unit (GPU) enables us to show dramatic increases in computing performance using a parallel computing platform and programming model such as Nvidia CUDA. Compute Unified Device Architecture (CUDA) is NVIDIAs graphics programming API to perform General Purpose Graphics Processing Unit Programming (GPGPU). The General Purpose […]
Apr, 6
Improving GPU Performance Prediction with Data Transfer Modeling
Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution […]
Apr, 6
Real-Time Object-Space Edge Detection using OpenCL
At its most basic, object-space edge detection iterates through all polygonal edges in each mesh to find those edges that satisfy one or more edge tests. Those that do are expanded and rendered, while the remainder are ignored. These 3D edges, and their resulting accuracy and customizability, set objectspace methods apart from all other categories […]
Apr, 6
Parallel Implementation of Dynamic Programming Algorithm Using Graphics Processing Unit
In this research implementation of a dynamic programming algorithm (Viterbi) has been done on graphics processing unit of NVidia using CUDA model. As graphical processing units are becoming important in supporting central processing units for the acceleration of complex floating point calculations. The complex computation goes on parallel in graphics processing unit as it contains […]
Apr, 4
Adapting Particle Filter Algorithms to Many-Core Architectures
The particle filter is a Bayesian estimation technique based on Monte Carlo simulation. It is ideal for non-linear, nonGaussian dynamical systems with applications in many areas, such as computer vision, robotics, and econometrics. Practical use has so far been limited, because of steep computational requirements. In this study, we investigate how to design a particle […]
Apr, 4
Deploying Graph Algorithms on GPUs: an Adaptive Solution
Thanks to their massive computational power and their SIMT computational model, Graphics Processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can […]
Apr, 4
Optimising Purely Functional GPU Programs
Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. It this […]
Apr, 4
Real-time Stereo Vision: Optimizing Semi-Global Matching
Semi-Global Matching (SGM) is arguably one of the most popular algorithms for real-time stereo vision. It is already employed in mass production vehicles today. Thinking of applications in intelligent vehicles (and fully autonomous vehicles in the long term), we aim at further improving SGM regarding its accuracy. In this study, we propose a straight-forward extension […]