Posts
May, 8
Somoclu: An Efficient Distributed Library for Self-Organizing Maps
Somoclu is a C++ tool for training self-organizing maps on large data sets using a high-performance cluster. It builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful […]
May, 7
Performance impact of dynamic parallelism on different clustering algorithms
In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested […]
May, 7
Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU
This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unified Device Architecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all […]
May, 7
Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
We demonstrate the acceleration obtained from using GPU/CPU hybrid clusters and supercomputers for N-body simulations of gravity based in part on the author’s new code development. Validation tests are shown for cosmological simulations and for galaxy simulations, along with their respective speedups compared to traditional simulations. Potential new applications for science enabled by this advance […]
May, 7
Critical Links Detection using CUDA
The Critical Links Detection (CLD) Problem consists of finding for the smallest set of edges in a graph to be protected so that if a given number of unprotected edges are removed the diameter does not exceed a given value. The diameter of a graph is defined as the length of the All-PairShortest-Path (APSP). This […]
May, 7
Optimizing CUDA Code By Kernel Fusion – Application on BLAS
Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs. However, the memory locality can be often improved by kernel fusion when a sequence of kernels is […]
May, 6
Accelerating Financial Applications on the GPU
The QuantLib library is a popular library used for many areas of computational finance. In this work, the parallel processing power of the GPU is used to accelerate QuantLib financial applications. Black-Scholes, Monte-Carlo, Bonds, and Repo code paths in QuantLib are accelerated using hand-written CUDA and OpenCL codes specifically targeted for the GPU. Additionally, HMPP […]
May, 6
Algorithms for Rapid Characterization and Optimization of Aperture and Reflector Antennas
Reflector antennas play a key role in the communication industry, and enhancing the speed of the analysis of reflector antenna systems can provide better responsiveness to the needs of industry as well as promote better understanding of software modeling through faster visualization. A reflector antenna system typically consists of a feed assembly, with a feedhorn […]
May, 6
Simulation of Biological Tissue using Mass-Spring-Damper Models
The goal of this project was to evaluate the viability of a mass-spring-damper based model for modeling of biological tissue. A method for automatically generating such a model from data taken from 3D medical imaging equipment including both the generation of point masses and an algorithm for generating the spring-damper links between these points is […]
May, 6
Fast Implementation of Scale Invariant Feature Transform Based on CUDA
Scale-invariant feature transform (SIFT) was an algorithm in computer vision to detect and describe local features in images. Due to its excellent performance, SIFT was widely used in many applications, but the implementation of SIFT was complicated and time-consuming. To solve this problem, this paper presented a novel acceleration algorithm for SIFT implementation based on […]
May, 6
Fast computation of MadGraph amplitudes on graphics processing unit (GPU)
Continuing our previous studies on QED and QCD processes, we use the graphics processing unit (GPU) for fast calculations of helicity amplitudes for general Standard Model (SM) processes. Additional HEGET codes to handle all SM interactions are introduced, as well assthe program MG2CUDA that converts arbitrary MadGraph generated HELAS amplitudess(FORTRAN) into HEGET codes in CUDA. […]
May, 4
Real-time Stochastic Optimization of Complex Energy Systems on High Performance Computers
We present a scalable approach that computes in operationally-compatible time the energy dispatch under uncertainty for complex energy systems of realistic size. Complex energy systems, such as the US power grid, are affected by increased uncertainty of its target power sources, due for example to increasing penetration of wind power coupled with the physical impossibility […]