9440

Posts

May, 9

Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting Method

The Parallel Proximal Algorithm (PPXA+) has been recently introduced as an efficient tool for solving convex optimization problems. It has proved particularly effective in the context of stereo vision, used as the methodological core of a novel disparity estimation technique. In this work, the main methodological issues limiting the efficient parallelization of this technique are […]
May, 8

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

In this paper, we mainly report on our experience and strategy in programming graphics processing units (GPUs) as fast parallel floating point coprocessors to accelerate the simulation of travelling shock waves of the 2-D Euler equation by the finite volume method. The GPU code is specialized in CUDA (Compute Unified Device Architecture) for which we […]
May, 8

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

AIMS. Neukirch and Rastatter (1999) re-formulate the linear magnetohydrostatic model (MHS) model of Low (1991) into a form that only requires the solution of two scalar elliptic partial differential equations. In this paper, we investigate an efficient numerical procedure for calculating MHS equilibria based on this representation. METHODS.The MHS equations are reduced to two scalar […]
May, 8

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Agent-based modelling and simulation is still an upcoming approach for microsimulation. But a large number of agents with advanced dynamics and interactions requires sophisticated algorithms and lots of computational effort. We try to implement a rather simple but special agent-based model model on GPU-architectures (graphics processing unit). This contribution presents the GPU implementations and and […]
May, 8

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic […]
May, 8

Somoclu: An Efficient Distributed Library for Self-Organizing Maps

Somoclu is a C++ tool for training self-organizing maps on large data sets using a high-performance cluster. It builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful […]
May, 7

Performance impact of dynamic parallelism on different clustering algorithms

In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested […]
May, 7

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unified Device Architecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all […]
May, 7

Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples

We demonstrate the acceleration obtained from using GPU/CPU hybrid clusters and supercomputers for N-body simulations of gravity based in part on the author’s new code development. Validation tests are shown for cosmological simulations and for galaxy simulations, along with their respective speedups compared to traditional simulations. Potential new applications for science enabled by this advance […]
May, 7

Critical Links Detection using CUDA

The Critical Links Detection (CLD) Problem consists of finding for the smallest set of edges in a graph to be protected so that if a given number of unprotected edges are removed the diameter does not exceed a given value. The diameter of a graph is defined as the length of the All-PairShortest-Path (APSP). This […]
May, 7

Optimizing CUDA Code By Kernel Fusion – Application on BLAS

Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs. However, the memory locality can be often improved by kernel fusion when a sequence of kernels is […]
May, 6

Accelerating Financial Applications on the GPU

The QuantLib library is a popular library used for many areas of computational finance. In this work, the parallel processing power of the GPU is used to accelerate QuantLib financial applications. Black-Scholes, Monte-Carlo, Bonds, and Repo code paths in QuantLib are accelerated using hand-written CUDA and OpenCL codes specifically targeted for the GPU. Additionally, HMPP […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: