Posts
Dec, 16
Reducing Thread Divergence in GPU-based B and B Applied to the Flow-shop problem
In this paper, we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, […]
Dec, 16
Algorithms acceleration of pattern-matching in multi-core architectures
The aim of this thesis is to create or adapt a programming model in order to make multi-core processors accessible by almost every programmer. This objective includes existing codes and algorithms reuse, debuggability, and the capacity to introduce changes incrementally. We face multi-cores with many architectures including homogeneity versus heterogeneity and shared-memory versus distributed-memory. We […]
Dec, 16
High-performance polynomial GCD computations on graphics processors
We propose an algorithm to compute a greatest common divisor (GCD) of univariate polynomials with large integer coefficients on Graphics Processing Units (GPUs). At the highest level, our algorithm relies on modular techniques to decompose the problem into subproblems that can be solved separately. Next, we employ resultant-based or matrix algebra methods to compute a […]
Dec, 16
Reducing Thread Divergence in GPU-based B&B Applied to the Flow-shop problem
In this paper, we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, […]
Dec, 16
Efficient XML Path Filtering Using GPUs
Publish-subscribe (pub-sub) systems present the state of the art in information dissemination to multiple users. Current XML-based pub-sub systems provide users with considerable exibility allowing the formulation of complex queries on the content as well as the structure of the streaming messages. Messages that contain one or more matches for a given user profile (query) […]
Dec, 16
A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers
We examine the problem of solving many thousands of small dense linear algebra factorizations simultaneously on Graphics Processing Units (GPUs). We are interested in problems ranging from several hundred of rows and columns to 4×4 matrices. Problems of this size are common, especially in signal processing. However, they have received very little attention from current […]
Dec, 16
Improving GPU Robustness by Making Use of Faulty Parts
With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as […]
Dec, 16
Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming
Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them […]
Dec, 16
Implementation and Evaluation of Scientific Simulations on High Performance Computing Architectures
Computational Science is field of study in which computers are used to solve challenging scientific problems. Real or imaginary world scientific problems are converted into mathematical models and solved using numerical analysis techniques with the help of high performance computing famously called scientific computing. As computer technology is advancing rapidly, computers are becoming increasingly powerful […]
Dec, 16
Affine Vector Cache for memory bandwidth savings
Preserving memory locality is a major issue in highly-multithreaded architectures such as GPUs. These architectures hide latency by maintaining a large number of threads in flight. As each thread needs to maintain a private working set, all threads collectively put tremendous pressure on on-chip memory arrays, at significant cost in area and power. We show […]
Dec, 15
Simultaneous Branch and Warp Interweaving for Sustained GPU Performance
Single-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into so-called warps to amortize the cost of instruction fetch, decode and control logic over multiple execution units. As individual threads take divergent execution paths, their processing takes place sequentially, defeating part of the efficiency advantage of […]
Dec, 15
Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more dif?cult. Current approaches rely on programmers […]