9051

Posts

Feb, 25

Graphics Card as a Cheap Supercomputer

The current powerful graphics cards, providing stunning real-time visual effects for computer-based entertainment, have to accommodate powerful hardware components that are able to deliver the photo-realistic simulation to the end-user. Given the vast computing power of the graphics hardware, its producers very often offer a programming interface that makes it possible to use the computational […]
Feb, 25

Future of GPGPU Micro-Architectural Parameters

As graphics processing units (GPUs) are becoming increasingly popular for general purpose workloads (GPGPU), the question arises how such processors will evolve architecturally in the near future. In this work, we identify and discuss tradeoffs for three GPU architecture parameters: active thread count, compute-memory ratio, and cluster and warp sizing. For each parameter, we propose […]
Feb, 23

Multi-GPU Computing for Achieving Speedup in Real-time Aggregate Risk Analysis

Stochastic simulation techniques employed for portfolio risk analysis, often referred to as Aggregate Risk Analysis, can benefit from exploiting state-of-the-art highperformance computing platforms. In this paper, we propose parallel methods to speedup aggregate risk analysis for supporting real-time pricing. To achieve this an algorithm for analysing aggregate risk is proposed and implemented in C and […]
Feb, 23

Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency

In recent studies, phase changing memory (PCM) has shown promising energy efficiency for systems with a modest level of parallelism. But it remains an open question whether it can benefit GPU-like massively parallel systems. This work conducts the first systematic investigation into this question. It empirically shows that contrary to the promising results shown before […]
Feb, 23

Efficient Parallel and External Matching

We show that a simple algorithm for computing a matching on a graph runs in a logarithmic number of phases incurring work linear in the input size. The algorithm can be adapted to provide efficient algorithms in several models of computation, such as PRAM, External Memory, MapReduce and distributed memory models. Our CREW PRAM algorithm […]
Feb, 23

Adaptive Hardware-accelerated Terrain Tessellation

In this master thesis report, a scheme for adaptive hardware tessellation is presented. The scheme uses an offline processing approach where a height map is analyzed in terms of curvature and the result is stored in a resource called density map. This density map is then bound as a resource to the hardware tessellation stage […]
Feb, 23

Parallel Computer Vision: Person Data Extraction

Face recognition has been established in many environments these days. It is used in security systems, social media platforms or in digital cameras to support the user. In addition, the rapidly rising number of CPU cores in modern PCs or handhelds let us do more complex work on a single machine. The central question of […]
Feb, 22

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

In this paper, we present an approach to estimate GPU applications’ performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization […]
Feb, 22

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency […]
Feb, 22

A Parallel Active-Set Method for Solving Frictional Contact Problems

Simulating frictional contact is a challenging computational task and there exist a variety of techniques to do so. One such technique, the staggered projections algorithm, requires the solution of two convex quadratic program (QP) subproblems at each iteration. We introduce a method, SCHURPA, which employs a primal-dual active-set strategy to efficiently solve these QPs based […]
Feb, 22

Investigating performance variations of an optimized GPU-ported granulometry algorithm

In this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of […]
Feb, 22

GPU-based Motion Planning under Uncertainties using POMDP

We present a novel GPU-based parallel algorithm to solve continuous-state POMDP problems. We choose the MCVI (Monte Carlo Value Iteration) method as our base algorithm [1], and parallelize this algorithm using multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to effectively utilize the massive data parallelism of GPUs. To obtain […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: