Posts
Oct, 6
Accelerating a climate physics model with OpenCL
Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase […]
Oct, 6
Static GPU threads and an improved scan algorithm
Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e.g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual […]
Oct, 6
GPU-based single-cluster algorithm for the simulation of the Ising model
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the […]
Oct, 6
Connected-component identification and cluster update on graphics processing units
Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as […]
Oct, 5
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energye-fficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverseengineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent […]
Oct, 5
Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU […]
Oct, 5
Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures
This paper presents a benchmarking, performance analysis and optimisation study of the OP2 "active" library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to […]
Oct, 5
GPU accelerated 2-D staggered-grid finite difference seismic modelling
The staggered-grid finite difference (FD) method demands significantly computational capability and is inefficient for seismic wave modelling in 2-D viscoelastic media on a single PC. To improve computation speedup, a graphic processing units (GPUs) accelerated method was proposed, for modern GPUs have now become ubiquitous in desktop computers and offer an excellent cost-to-performance-ratio parallelism. The […]
Oct, 5
Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads
In this talk we address two problems frequently encountered by GPU developers: optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both require dynamic adaptation and tuning of execution policies to allow high performance for a wide […]
Oct, 5
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
In this study computations of the two-dimensional Direct Simulation Monte Carlo (DSMC) method using Graphics Processing Units (GPUs) are presented. An all-device (GPU) computational approach is adopted-where the entire computation is performed on the GPU device, leaving the CPU idle-which includes particle moving, indexing, collisions between particles and state sampling. The subsequent application to GPU […]
Oct, 5
A Framework for Automated Performance Tuning and Code Verification on GPU Computing Platforms
Emerging multi-core processor designs create a computing paradigm capable of advancing numerous scientific areas, including medicine, data mining, biology, physics, and earth sciences. However, the trends in multi-core hardware technology have advanced far ahead of the advances in software technology and programmer productivity. For the most part, current scientists only leverage multi-core and GPU (Graphical […]
Oct, 5
High-Order Discontinuous Galerkin Methods by GPU Metaprogramming
Discontinuous Galerkin (DG) methods for the numerical solution of par- tial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, […]