Posts
Nov, 15
An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems
The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle […]
Nov, 15
gSLIC: a real-time implementation of SLIC superpixel segmentation
We introduce a parallel implementation of the Simple Linear Iterative Clustering (SLIC) superpixel segmentation. Our implementation uses GPU and the NVIDIA CUDA framework. Using a single graphic card, our implementation achieves speedups of 10x~20x from the sequential implementation. This allow us to use the superpixel segmentation method in real-time performance. Our implementation is compatible with […]
Nov, 15
Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs
Recent GPU developments have attracted much interest in the HPC community. Since each GPU interface requires a dedicated host processor, the unused high performance non-GPU processors are simply wasted. GPUs are energy intensive and are more likely to fail than CPUs, we are interested in using all processors to a) boosting application performance, and b) […]
Nov, 15
Efficient Graph Comparison and Visualization Using GPU
This paper presents application of several graph algorithms for comparison and visualization of real-world networks. In order to obtain interactive and robust framework for analysis of large graphs we use CUDA implementations of all-shortest-paths (APSP) and breadth-first-search (BFS) algorithms along with CULA matrix decomposition routines. Such an approach allows for efficient computation of graph feature […]
Nov, 14
A capabilities-aware framework for using computational accelerators in data-intensive computing
Multicore computational accelerators such as GPUs are now commodity components for high-performance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed […]
Nov, 14
Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters
In this work, we present our implementation of the density functional theory (DFT) plane wave pseudopotential (PWP) calculations on GPU clusters. This GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate up to a thousand atoms on thousands of processors. Our test indicates that the GPU version can have a […]
Nov, 14
Toward improved aeromechanics simulations using recent advancements in scientific computing
The proposed paper will present details on recent advancements in scientific computing in terms of integrating new hardware and software to greatly enhance the computational efficiency of comprehensive rotorcraft analysis. The focus will be on showing the tremendous computational accelerations that are possible (i.e., orders of magnitude speed up) by using software developments in the […]
Nov, 14
Solving Incompressible Two-Phase Flows on Massively Parallel Multi-GPU Clusters
We present a fully multi-GPU-based double-precision solver for the three-dimensional two-phase incompressible Navier-Stokes equations. An in-depth performance analysis shows a realistic speed-up of the order of three by comparing equally priced GPUs and CPUs and more than a doubling in energy efficiency for GPUs. We observe profound strong and weak scaling on a multi-GPU cluster.
Nov, 14
Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion
We perform hybrid Monte Carlo (HMC) simulatons of two flavors QCD with the optimal domain-wall fermion (ODWF) on the $ 16^3 times 32 $ lattice (with lattice spacing $ a sim 0.1 $ fm), for eight sea-quark masses corresponding to pion masses in the range 230-580 MeV. We calculate the mass and the decay constant […]
Nov, 14
Multi GPU Implementation of the Simplex Algorithm
The Simplex algorithm is a well known method to solve linear programming (LP) problems. In this paper, we propose an implementation via CUDA of the Simplex method on a multi GPU architecture. Computational tests have been carried out on randomly generated instances for non-sparse LP problems. The tests show a maximum speedup of 24:5 with […]
Nov, 14
GPU-accelerated power pattern synthesis of aperiodic linear arrays
We deal with the development of a computationally effective approach for the synthesis of equivalently tapered, aperiodic linear arrays, i.e. arrays matching the requirements on the power pattern by acting only on the element positions and excitation phases. The computational effectiveness of the algorithm is reached by the development of a parallel Non Uniform Fast […]
Nov, 14
AVSS2011 demo session: GPU enabled Smart Video Node
This paper presents an All-in-One video analytics system, a compact, multi-channel, real-time, video monitoring, event detection, alarm notification, event recording and browsing solution implemented on low cost hardware, taking advantage of NVIDIA’s GPU CUDA platform. An inventive distribution of video object detection and tracking processing chain between the GPUs and the CPU provides maximum efficiency […]