Posts
Feb, 22
Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs
In this paper, we present an approach to estimate GPU applications’ performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization […]
Feb, 22
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency […]
Feb, 22
A Parallel Active-Set Method for Solving Frictional Contact Problems
Simulating frictional contact is a challenging computational task and there exist a variety of techniques to do so. One such technique, the staggered projections algorithm, requires the solution of two convex quadratic program (QP) subproblems at each iteration. We introduce a method, SCHURPA, which employs a primal-dual active-set strategy to efficiently solve these QPs based […]
Feb, 22
Investigating performance variations of an optimized GPU-ported granulometry algorithm
In this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of […]
Feb, 22
GPU-based Motion Planning under Uncertainties using POMDP
We present a novel GPU-based parallel algorithm to solve continuous-state POMDP problems. We choose the MCVI (Monte Carlo Value Iteration) method as our base algorithm [1], and parallelize this algorithm using multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to effectively utilize the massive data parallelism of GPUs. To obtain […]
Feb, 21
Scheduling a Parallel Sparse Direct Solver to Multiple GPUs
We present a sparse direct solver using multilevel task scheduling on a modern heterogeneous compute node consisting of a multi-core host processor and multiple GPU accelerators. Our direct solver is based on the multifrontal method, which is characterized by exploiting dense subproblems (fronts) related in an assembly tree. Critical to high performance of the solver […]
Feb, 21
Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics
The last decade witnessed a manifest shift in the microprocessor industry towards chip designs that promote parallel computing. Until recently the privilege of a select group of large research centers, Teraflop computing is becoming a commodity owing to inexpensive GPU cards and multi to many-core x86 processors. This paradigm shift towards large scale parallel computing […]
Feb, 21
Computation of Air-Vortices Based on GPU Technology: Optimizing and Parallelizing a Model for Wake-Vortex Prediction Using OpenCL
This thesis details the refinement and numerical solution of a preexisting model for predicting the strengths and positions of so-called wake-vortices that are generated from the lift of heavy aircraft. The ultimate objective is to implement a numerical scheme for the model that is fast enough to allow for probabilistic methods, such as Monte Carlosimulations, […]
Feb, 21
GamePipe: A Virtualized Cloud Platform Design and Performance Evaluation
Cloud gaming provides game-on-demand (GoD) services over the Internet cloud. The goal is to achieve faster response time and higher QoS. The video game is rendered remotely on the game cloud and decoded on thin client devices such as tablet computer or smartphone. We design a game cloud with a virtualized cluster of CPU/GPU servers […]
Feb, 21
Ray Tracing on GPUs
The ray tracing method aims for producing realistic and high-quality images of a scene described by geometric primitives such as triangles, spheres, etc. The basic idea is quiet simple and allows for straight forward implementations of this technique on the computer. At its core is a set of rays, each of which corresponding to one […]
Feb, 20
Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced Memory Accesses on GPU
The performance of Graphic Processing Units (GPU) is sensitive to irregular memory references. Some recent work shows the promise of data reorganization for eliminating non-coalesced memory accesses that are caused by irregular references. However, all previous studies have employed simple, heuristic methods to determine the new data layouts to create. As a result, they either […]
Feb, 20
An abstract object oriented runtime system for heterogeneous parallel architecture
In our paper we present an abstract object oriented runtime system that helps to develop scientific application for new hererogenous architecture based on multi-node of multi-core processors enhanced with accelerator boards. Its architecture based on abstract concepts enables to follow hardware technology by extending them with new implementations modeling new hardware components, while limiting the […]

