Posts
Feb, 18
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures
Emergent heterogeneous systems must be optimized for both power and performance at exascale. Massive parallelism combined with complex memory hierarchies form a barrier to efficient application and architecture design. These challenges are exacerbated with GPUs as parallelism increases orders of magnitude and power consumption can easily double. Models have been proposed to isolate power and […]
Feb, 18
Offload Compiler Runtime for the Intel Xeon Phi Coprocessor
The Intel Xeon Phi coprocessor platform has a software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-functional Intel Architecture CPU, namely, the Intel Xeon Phi coprocessor. The purpose of that offload is to improve response time and/or throughput. The […]
Feb, 18
Formalizing Address Spaces with application to Cuda, OpenCL, and beyond
Cuda and OpenCL are aimed at programmers developing parallel applications targeting GPUs and embedded micro-processors. These systems often have explicitly managed memories exposed directly though a notion of disjoint address spaces. OpenCL address spaces are based on a similar concept found in Embedded C. A limitation of OpenCL is that a specific pointer must be […]
Feb, 15
Hybrid parallel programming – evaluation of OpenACC
OpenACC is a new specification for a hybrid (CPU + GPU) parallel programming API, in which the programmer uses compiler directives to distribute the computation between the GPU and the CPU. With a similar paradigm to OpenMP, OpenACC presents clear advantages in terms of ease of programming. Regarding performance, however, a comparison between OpenACC and […]
Feb, 15
pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video
An increasing number of methods for background subtraction use Robust PCA to identify sparse foreground objects. While many algorithms use the L1-norm as a convex relaxation of the ideal sparsifying function, we approach the problem with a smoothed Lp-norm and present pROST, a method for robust online subspace tracking. The algorithm is based on alternating […]
Feb, 15
MATLAB and Python for GPU Computing
Recent trends in hardware development have led to graphics processing units (GPUs) evolving into highly-parallel, multi-core computing platforms suitable for computational science applications. Recently, GPUs such as the NVIDIA Tesla 20-series (with up to 448 cores) have become available to the High Performance Computing Modernization Program (HPCMP) user community. Traditionally, NVIDIA GPUs are programmed using […]
Feb, 15
Urban Regional Seismic Damage Prediction Based On GPU-CPU Hybrid Computing
In recent years, refined building models have been widely used for urban regional seismic damage prediction, but its application is limited due to the computing workload and the cost when it is implemented on traditional CPU platform. However, GPU computing technology, which is developing rapidly in these years, provides a feasible way to solve this […]
Feb, 15
Massively Parallel Computing in Economics
This paper discusses issues related to parallel computing in Economics. It highlights new methodologies and resources that are available for solving and estimating economic models and emphasizes situations when they are useful and others where they are impractical. Two examples illustrate the different ways parallel methods can be employed to speed computation as well as […]
Feb, 14
Developing Performance-Portable Molecular Dynamics Kernels in OpenCL
This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation […]
Feb, 14
Exploring SIMD for Molecular Dynamics, Using Intel Xeon Processors and Intel Xeon Phi Coprocessors
We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256- and 512-bit). The applicability of these optimisations to wider SIMD […]
Feb, 14
Enhancing Performance of Meshfree Methods by Hybrid Computing
Hybrid computing technique is used in this study to significantly enhance the performance of meshfree methods. These methods are typically slower than finite element methods (FEM) mostly because their stiffness matrices are much denser ones formed by FEM. As a result, both forming stiffness matrices and solving equations are much slower. In this paper, we […]
Feb, 14
The Dual-Path Execution Model for Efficient GPU Control Flow
Current graphics processing units (GPUs) utilize the single instruction multiple thread (SIMT) execution model. With SIMT, a group of logical threads executes such that all threads in the group execute a single common instruction on a particular cycle. To enable control flow to diverge within the group of threads, GPUs partially serialize execution and follow […]