Posts
Jun, 19
Optimizing dataflow applications on heterogeneous environments
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate […]
Jun, 19
Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware
We present an efficient implementation of the Lattice Boltzmann method (LBM) for the numerical simulation of the propagation of long ocean waves (e.g., tsunamis), based on the Nonlinear Shallow Water (NSW) wave equation. The LBM solution of NSW equations is fully nonlinear and it is assumed that the surface elevation is single-valued (hence, waves do […]
Jun, 19
Implementing density functional theory (DFT) methods on many-core GPGPU accelerators
Density Functional Theory (DFT) is one of the most widely used quantum mechanical methods for calculations of the electronic structure of molecules and surfaces, which achieves an excellent balance of accuracy and computational cost. However, for large molecular systems with few hundred atoms, the computational costs are become very high. Therefore, there is a fast […]
Jun, 18
Gdev: First-Class GPU Resource Management in the Operating System
Graphics processing units (GPUs) have become a very powerful platform embracing a concept of heterogeneous many-core computing. However, application domains of GPUs are currently limited to specific systems, largely due to a lack of "first-class" GPU resource management for general-purpose multi-tasking systems. We present Gdev, a new ecosystem of GPU resource management in the operating […]
Jun, 18
An Improved CUDA-Based Implementation of Differential Evolution on GPU
Modern GPUs enable widely affordable personal computers to carry out massively parallel computation tasks. NVIDIA’s CUDA technology provides a wieldy parallel computing platform. Many state-of-the-art algorithms arising from different fields have been redesigned based on CUDA to achieve computational speedup. Differential evolution (DE), as a very promising evolutionary algorithm, is highly suitable for parallelization owing […]
Jun, 18
OpenCL for programming shared memory multicore CPUs
Shared memory multicore processor technology is pervasive in mainstream computing. This new architecture challenges programmers to write code that scales over these many cores to exploit the full computational power of these machines. OpenMP and Intel Threading Building Blocks (TBB) are two of the popular frameworks used to program these architectures. Recently, OpenCL has been […]
Jun, 18
Solving the Vlasov equation for one-dimensional models with long range interactions on a GPU
We present a GPU parallel implementation of the numeric integration of the Vlasov equation in one spatial dimension based on a second order time-split algorithm with a local modified cubic-spline interpolation. We apply our approach to three different systems with long-range interactions: the Hamiltonian Mean Field, Ring and the self-gravitating sheet models. Speedups and accuracy […]
Jun, 18
OpenACC – First Experiences with Real-World Applications
Today’s trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming these APIs is often tedious and therefore unproductive. To tackle this problem, recent approaches employ directive-based high-level programming for accelerators. In this work, we present our first experiences with OpenACC, an API consisting of […]
Jun, 17
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units
Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. […]
Jun, 17
Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications
Control and memory divergence between threads within the same execution bundle, or warp, have been shown to cause significant performance bottlenecks for GPU applications. In this paper, we exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a […]
Jun, 16
ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU
In this paper, we analyze the special requirements of a dynamic memory allocator that is designed for massively parallel architectures such as Graphics Processing Units (GPUs). We show that traditional strategies, which work well on CPUs, are not well suited for the use on GPUs and present the thorough design of ScatterAlloc, which can efficiently […]
Jun, 16
E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA
This paper introduces an Enhanced Multi Objective Genetic Algorithm (E-MOGA) running on Compute Unified Device Architecture (CUDA) hardware, as a general purpose tool that can solve conflict optimization problems. The tool demonstrates significant speed gains using affordable, scalable and commercially available hardware. The objectives of this research are: to enhance the general purpose Multi Objective […]