Posts
Jun, 21
FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method
The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)—a hierarchical algorithm for N-body problems with highly scalable parallel implementations—as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagrangian […]
Jun, 21
High-precision molecular dynamics simulation of UO2-PuO2: Anion self-diffusion in UO2
Our series of articles is devoted to high-precision molecular dynamics simulation of mixed actinide-oxide (MOX) fuel in the approximation of rigid ions and pair interactions (RIPI) using high-performance graphics processors (GPU). In this article we study self-diffusion mechanisms of oxygen anions in uranium dioxide (UO2) with the ten recent and widely used sets of interatomic […]
Jun, 20
GPU Accelerated Greedy Algorithms for Compressed Sensing
For appropriate matrix ensembles, greedy algorithms have proven to be an efficient means of solving the combinatorial optimization problem associated with compressed sensing. This paper describes an implementation for graphics processing units (GPU) of hard thresholding, iterative hard thresholding, normalized iterative hard thresholding, hard thresholding pursuit, and a two stage thresholding algorithm based on compressive […]
Jun, 20
Towards a GPU-based Implementation of Interaction Nets
We present ingpu, a GPU-based evaluator for interaction nets that heavily utilizes their potential for parallel evaluation. We discuss advantages and challenges of the ongoing implementation of ingpu and compare its performance to existing interaction nets evaluators.
Jun, 20
GPU Computing: Image Convolution
Convolution of two functions is an important mathematical operation that found heavy application in signal processing. In computer graphics and image processing we usually work with discrete functions (e.g. an image) and apply a discrete form of the convolution to remove high frequency noise, sharpen details, detect edges, or otherwise modulate the frequency domain of […]
Jun, 20
Parallel Implementation of the Wu-Manber Algorithm Using the OpenCL Framework
One of the most significant issues of the computational biology is the multiple pattern matching for locating nucleotides and amino acid sequence patterns into biological databases. Sequential implementations for these processes have become inadequate, due to an increasing demand for more computational power. Graphic cards offer a high parallelism computational power improving the performance of […]
Jun, 20
An Investigation into Concurrent Expectation Propagation
As statistical machine learning becomes more and more prevalent and models become more complicated and fit to larger amounts of data, approximate inference mechanisms become more and more crucial to their success. Expectation propagation (EP) is one such algorithm for inference in probabilistic graphical models. In this work, we introduce a robustified version of EP […]
Jun, 19
Two Algorithms for Sorting On Heterogeneous Clusters
In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations […]
Jun, 19
Parallel Rendering on Hybrid Multi-GPU Clusters
Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a […]
Jun, 19
Optimizing dataflow applications on heterogeneous environments
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate […]
Jun, 19
Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware
We present an efficient implementation of the Lattice Boltzmann method (LBM) for the numerical simulation of the propagation of long ocean waves (e.g., tsunamis), based on the Nonlinear Shallow Water (NSW) wave equation. The LBM solution of NSW equations is fully nonlinear and it is assumed that the surface elevation is single-valued (hence, waves do […]
Jun, 19
Implementing density functional theory (DFT) methods on many-core GPGPU accelerators
Density Functional Theory (DFT) is one of the most widely used quantum mechanical methods for calculations of the electronic structure of molecules and surfaces, which achieves an excellent balance of accuracy and computational cost. However, for large molecular systems with few hundred atoms, the computational costs are become very high. Therefore, there is a fast […]