5891

Posts

Oct, 5

Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads

In this talk we address two problems frequently encountered by GPU developers: optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both require dynamic adaptation and tuning of execution policies to allow high performance for a wide […]
Oct, 5

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

In this study computations of the two-dimensional Direct Simulation Monte Carlo (DSMC) method using Graphics Processing Units (GPUs) are presented. An all-device (GPU) computational approach is adopted-where the entire computation is performed on the GPU device, leaving the CPU idle-which includes particle moving, indexing, collisions between particles and state sampling. The subsequent application to GPU […]
Oct, 5

A Framework for Automated Performance Tuning and Code Verification on GPU Computing Platforms

Emerging multi-core processor designs create a computing paradigm capable of advancing numerous scientific areas, including medicine, data mining, biology, physics, and earth sciences. However, the trends in multi-core hardware technology have advanced far ahead of the advances in software technology and programmer productivity. For the most part, current scientists only leverage multi-core and GPU (Graphical […]
Oct, 5

High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Discontinuous Galerkin (DG) methods for the numerical solution of par- tial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, […]
Oct, 5

Flexible, high performance convolutional neural networks for image classification

We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, […]
Oct, 5

A parallel error diffusion implementation on a GPU

In this paper, we investigate the suitability of the GPU for a parallel implementation of the pinwheel error diffusion. We demonstrate a high-performance GPU implementation by efficiently parallelizing and unrolling the image processing algorithm. Our GPU implementation achieves a 10 – 30x speedup over a two-threaded CPU error diffusion implementation with comparable image quality. We […]
Oct, 4

GPU performance comparison for accelerated radar data processing

Radar is a data-intensive measurement technique often requiring significant processing to make full use of the received signal. However, computing capacity is limited at remote or mobile radar installations thereby limiting radar data products used for real-time decisions. We used graphics processing units (GPUs) to accelerate processing of high resolution phase-coded radar data from the […]
Oct, 4

A Massive Data Parallel Computational Framework on Petascale/Exascale Hybrid Computer Systems

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA [1] and OpenCL [2] it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge [3] (a library based framework for heterogeneous multi-core systems), Zippy [4] (a framework for parallel […]
Oct, 4

Architecture-Aware Optimization on a 1600-core Graphics Processor

The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated […]
Oct, 4

Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs

Numerical simulation and its huge computational demands require a close coupling between efficient mathematical methods and their hardware-aware implementation on emerging and highly parallel computing platforms. The paradigm shift towards manycore parallelism not only offers a high potential of computing capabilities but also comes up with urgent challenges in designing scalable, portable, and flexible software […]
Oct, 4

GPU Algorithms for Diamond-based Multiresolution Terrain Processing

We present parallel algorithms for processing, extracting and rendering adaptively sampled regular terrain datasets represented as a multiresolution model defined by a super-square-based diamond hierarchy. This model represents a terrain as a nested triangle mesh generated through a series of longest edge bisections and encoded in an implicit hierarchical structure, which clusters triangles into diamonds […]
Oct, 4

Finite element assembly strategies on multi-and many-core architectures

We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: