6655

Posts

Dec, 14

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

We propose a novel approach for runtime monitoring and verification on computers with a large number of computation cores. The goal of the approach is to minimize the impact of runtime verification on the performance of the application being monitored. We distinguish between two kinds of computational overhead: (i) overhead caused by instrumentation and/or logging, […]
Dec, 14

Fast Neural Network Training on General Purpose Computers

Neural networks allow the implementation of complicated applications such as stock market predictions on low-end PCs. However, the training of neural networks can take many hours on a PC. In this paper we propose a technique for training complicated neural networks on a commodity GPU (available in a low-end PC) that completes 6 times faster […]
Dec, 14

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

The goal of this paper is to implement an efficient matrix inversion of symmetric positive-definite matrices on heterogeneous GPU-based systems. The matrix inversion procedure can be split into three stages: computing the Cholesky factorization, inverting the Cholesky factor and calculating the product of the inverted Cholesky factor with its transpose to get the final inverted […]
Dec, 14

Accelerating Live Graph-Cut-Based Object Tracking Using CUDA

Graph cuts have found many applications that address the problem of energy minimization, which occur frequently in computer vision and image processing. One of the most common applications is binary image segmentation, or silhouette extraction. Image segmentation is the process of applying a labeling to each pixel in an image to determine a list of […]
Dec, 14

Voxelized Minkowski sum computation on the GPU with robust culling

We present a new approach for computing the voxelized Minkowski sum (excluding any enclosed voids) of two polyhedral objects using programmable Graphics Processing Units (GPUs). We first cull out surface primitives that will not contribute to the final boundary of the Minkowski sum, analyzing and adaptively bounding the rounding errors of the culling algorithm to […]
Dec, 14

Water Surface Animation using Damped Wave Equation and CUDA Acceleration

The damped wave equation is used for simulating water waves. The differential equation is approximated by finite differences. Explicit integration produces water height fields in real time. The CUDA framework is used to perform parallel computations on the GPU. It is shown that the GPU provides considerable speedup in comparison to the CPU.
Dec, 13

Development of an unified FDTD-FEM library for electromagnetic analysis with CPU and GPU computing

We describe a C++ library for electromagnetics based on the Finite-Difference Time-Domain method for transient analysis, and the Finite Element Method for modal analysis. Both methods share the same core and also both methods are optimized for CPU and GPU computing. The FEM method is applied for solving Laplace’s equation and analyzes the relation between […]
Dec, 13

Automatic library generation for BLAS3 on GPUs

High-performance libraries, the performance-critical building blocks for high-level applications, will assume greater importance on modern processors as they become more complex and diverse. However, automatic library generators are still immature, forcing library developers to manually tune library to meet their performance objectives. We are developing a new script-controlled compilation framework to help domain experts reduce […]
Dec, 13

RaVioli: a GPU Supported High-Level Pseudo Real-time Video Processing Library

Real-time video processing applications such as intruder detection system are now in demand and being developed. However, on general purpose computers, it is difficult to guarantee that enough CPU resources can be surely be provided. We have proposed a pseudo real-time video processing library RaVioli for solving this problem. RaVioli conceals two types of resolutions, […]
Dec, 13

Parallel Implementations of Beamforming Design and Filtering for Microphone Array Applications

One of the main limitations of microphone array algorithms for audio applications has been their high computational cost in real acoustic environments when real-time signal processing is absolutely required. Regarding audio/speech signal processing, beamforming algorithms have been used for the recovery of acoustic signals from their observations when they are corrupted by noise, reverberation and […]
Dec, 13

Developing an OO Model for Generalized Matrix Multiplication: Preliminary Considerations

Recent changes in computational sciences force reevaluation of the role of dense matrix multiplication. Among others, this resulted in a proposal to consider generalized matrix multiplication, based on the theory of algebraic semirings. The aim of this note is to outline an initial object oriented model of the generalized matrix-multiply-add operation.
Dec, 13

Collision-Driven Volumetric Deformation on the GPU

We present a novel parallel algorithm to animate the deformation of a soft body in response to collision. Our algorithm incorporates elements of physically-based methods, and at the same time, it allows artistic control of general deformation behavior. Our solver has important benefits for practical use, such as evaluation of animation frames in an arbitrary […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: