Posts
Feb, 25
SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner
To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, […]
Feb, 25
Graphics Card as a Cheap Supercomputer
The current powerful graphics cards, providing stunning real-time visual effects for computer-based entertainment, have to accommodate powerful hardware components that are able to deliver the photo-realistic simulation to the end-user. Given the vast computing power of the graphics hardware, its producers very often offer a programming interface that makes it possible to use the computational […]
Feb, 25
Future of GPGPU Micro-Architectural Parameters
As graphics processing units (GPUs) are becoming increasingly popular for general purpose workloads (GPGPU), the question arises how such processors will evolve architecturally in the near future. In this work, we identify and discuss tradeoffs for three GPU architecture parameters: active thread count, compute-memory ratio, and cluster and warp sizing. For each parameter, we propose […]
Feb, 23
Multi-GPU Computing for Achieving Speedup in Real-time Aggregate Risk Analysis
Stochastic simulation techniques employed for portfolio risk analysis, often referred to as Aggregate Risk Analysis, can benefit from exploiting state-of-the-art highperformance computing platforms. In this paper, we propose parallel methods to speedup aggregate risk analysis for supporting real-time pricing. To achieve this an algorithm for analysing aggregate risk is proposed and implemented in C and […]
Feb, 23
Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency
In recent studies, phase changing memory (PCM) has shown promising energy efficiency for systems with a modest level of parallelism. But it remains an open question whether it can benefit GPU-like massively parallel systems. This work conducts the first systematic investigation into this question. It empirically shows that contrary to the promising results shown before […]
Feb, 23
Efficient Parallel and External Matching
We show that a simple algorithm for computing a matching on a graph runs in a logarithmic number of phases incurring work linear in the input size. The algorithm can be adapted to provide efficient algorithms in several models of computation, such as PRAM, External Memory, MapReduce and distributed memory models. Our CREW PRAM algorithm […]
Feb, 23
Adaptive Hardware-accelerated Terrain Tessellation
In this master thesis report, a scheme for adaptive hardware tessellation is presented. The scheme uses an offline processing approach where a height map is analyzed in terms of curvature and the result is stored in a resource called density map. This density map is then bound as a resource to the hardware tessellation stage […]
Feb, 23
Parallel Computer Vision: Person Data Extraction
Face recognition has been established in many environments these days. It is used in security systems, social media platforms or in digital cameras to support the user. In addition, the rapidly rising number of CPU cores in modern PCs or handhelds let us do more complex work on a single machine. The central question of […]
Feb, 22
Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs
In this paper, we present an approach to estimate GPU applications’ performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization […]
Feb, 22
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency […]
Feb, 22
A Parallel Active-Set Method for Solving Frictional Contact Problems
Simulating frictional contact is a challenging computational task and there exist a variety of techniques to do so. One such technique, the staggered projections algorithm, requires the solution of two convex quadratic program (QP) subproblems at each iteration. We introduce a method, SCHURPA, which employs a primal-dual active-set strategy to efficiently solve these QPs based […]
Feb, 22
Investigating performance variations of an optimized GPU-ported granulometry algorithm
In this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of […]