5895

Posts

Oct, 5

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energye-fficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverseengineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent […]
Oct, 5

Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU […]
Oct, 5

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

This paper presents a benchmarking, performance analysis and optimisation study of the OP2 "active" library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to […]
Oct, 5

GPU accelerated 2-D staggered-grid finite difference seismic modelling

The staggered-grid finite difference (FD) method demands significantly computational capability and is inefficient for seismic wave modelling in 2-D viscoelastic media on a single PC. To improve computation speedup, a graphic processing units (GPUs) accelerated method was proposed, for modern GPUs have now become ubiquitous in desktop computers and offer an excellent cost-to-performance-ratio parallelism. The […]
Oct, 5

Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads

In this talk we address two problems frequently encountered by GPU developers: optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both require dynamic adaptation and tuning of execution policies to allow high performance for a wide […]
Oct, 5

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

In this study computations of the two-dimensional Direct Simulation Monte Carlo (DSMC) method using Graphics Processing Units (GPUs) are presented. An all-device (GPU) computational approach is adopted-where the entire computation is performed on the GPU device, leaving the CPU idle-which includes particle moving, indexing, collisions between particles and state sampling. The subsequent application to GPU […]
Oct, 5

A Framework for Automated Performance Tuning and Code Verification on GPU Computing Platforms

Emerging multi-core processor designs create a computing paradigm capable of advancing numerous scientific areas, including medicine, data mining, biology, physics, and earth sciences. However, the trends in multi-core hardware technology have advanced far ahead of the advances in software technology and programmer productivity. For the most part, current scientists only leverage multi-core and GPU (Graphical […]
Oct, 5

High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Discontinuous Galerkin (DG) methods for the numerical solution of par- tial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, […]
Oct, 5

Flexible, high performance convolutional neural networks for image classification

We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, […]
Oct, 5

A parallel error diffusion implementation on a GPU

In this paper, we investigate the suitability of the GPU for a parallel implementation of the pinwheel error diffusion. We demonstrate a high-performance GPU implementation by efficiently parallelizing and unrolling the image processing algorithm. Our GPU implementation achieves a 10 – 30x speedup over a two-threaded CPU error diffusion implementation with comparable image quality. We […]
Oct, 4

GPU performance comparison for accelerated radar data processing

Radar is a data-intensive measurement technique often requiring significant processing to make full use of the received signal. However, computing capacity is limited at remote or mobile radar installations thereby limiting radar data products used for real-time decisions. We used graphics processing units (GPUs) to accelerate processing of high resolution phase-coded radar data from the […]
Oct, 4

A Massive Data Parallel Computational Framework on Petascale/Exascale Hybrid Computer Systems

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA [1] and OpenCL [2] it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge [3] (a library based framework for heterogeneous multi-core systems), Zippy [4] (a framework for parallel […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: