9438

Posts

May, 8

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

In this paper, we mainly report on our experience and strategy in programming graphics processing units (GPUs) as fast parallel floating point coprocessors to accelerate the simulation of travelling shock waves of the 2-D Euler equation by the finite volume method. The GPU code is specialized in CUDA (Compute Unified Device Architecture) for which we […]
May, 8

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

AIMS. Neukirch and Rastatter (1999) re-formulate the linear magnetohydrostatic model (MHS) model of Low (1991) into a form that only requires the solution of two scalar elliptic partial differential equations. In this paper, we investigate an efficient numerical procedure for calculating MHS equilibria based on this representation. METHODS.The MHS equations are reduced to two scalar […]
May, 8

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Agent-based modelling and simulation is still an upcoming approach for microsimulation. But a large number of agents with advanced dynamics and interactions requires sophisticated algorithms and lots of computational effort. We try to implement a rather simple but special agent-based model model on GPU-architectures (graphics processing unit). This contribution presents the GPU implementations and and […]
May, 8

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic […]
May, 8

Somoclu: An Efficient Distributed Library for Self-Organizing Maps

Somoclu is a C++ tool for training self-organizing maps on large data sets using a high-performance cluster. It builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful […]
May, 7

Performance impact of dynamic parallelism on different clustering algorithms

In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested […]
May, 7

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unified Device Architecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all […]
May, 7

Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples

We demonstrate the acceleration obtained from using GPU/CPU hybrid clusters and supercomputers for N-body simulations of gravity based in part on the author’s new code development. Validation tests are shown for cosmological simulations and for galaxy simulations, along with their respective speedups compared to traditional simulations. Potential new applications for science enabled by this advance […]
May, 7

Critical Links Detection using CUDA

The Critical Links Detection (CLD) Problem consists of finding for the smallest set of edges in a graph to be protected so that if a given number of unprotected edges are removed the diameter does not exceed a given value. The diameter of a graph is defined as the length of the All-PairShortest-Path (APSP). This […]
May, 7

Optimizing CUDA Code By Kernel Fusion – Application on BLAS

Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs. However, the memory locality can be often improved by kernel fusion when a sequence of kernels is […]
May, 6

Accelerating Financial Applications on the GPU

The QuantLib library is a popular library used for many areas of computational finance. In this work, the parallel processing power of the GPU is used to accelerate QuantLib financial applications. Black-Scholes, Monte-Carlo, Bonds, and Repo code paths in QuantLib are accelerated using hand-written CUDA and OpenCL codes specifically targeted for the GPU. Additionally, HMPP […]
May, 6

Algorithms for Rapid Characterization and Optimization of Aperture and Reflector Antennas

Reflector antennas play a key role in the communication industry, and enhancing the speed of the analysis of reflector antenna systems can provide better responsiveness to the needs of industry as well as promote better understanding of software modeling through faster visualization. A reflector antenna system typically consists of a feed assembly, with a feedhorn […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: