7127

Posts

Dec, 12

GPU Programming in a High Level Language: Compiling X10 to CUDA

GPU architectures have emerged as a viable way of considerably improving performance for appropriate applications. Program fragments (kernels) appropriate for GPU execution can be implemented in CUDA or OpenCL and glued into an application via an API. While there is plenty of evidence of performance improvements using this approach, there are many issues with productivity. […]
Dec, 12

A Common GPU n-Dimensional Array for Python and C

Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work. This paper proposes and presents a first version of a common GPU n-dimensional array(tensor) named GpuNdArray that works with both CUDA and OpenCL. It will be usable from python, C and possibly […]
Dec, 10

Multi-Science Applications with Single Codebase – GAMER – for Massively Parallel Architectures

The growing need for power efficient extreme-scale highperformance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massively parallel compute architectures. Tens to many hundreds of cores are increasingly made available as compute units, either as the integral part of the main processor or as coprocessors designed for handling massively parallel workloads. In […]
Dec, 6

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results

GPUs offer several times the floating point performance and memory bandwidth of current standard two socket CPU servers, e.g. NVIDIA C2070 vs. Intel Xeon Westmere X5650. The lattice Boltzmann method has been established as a flow solver in recent years and was one of the first flow solvers to be successfully ported and that performs […]
Dec, 5

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) – which have become one of the main co-processors that contributed towards high performance […]
Nov, 22

Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data

Heterogeneous systems often employ processing units with a wide spectrum of performance capabilities. Allowing individual applications to make greedy local scheduling decisions leads to imbalance, with underutilization of some devices and excessive contention for others. If we instead allow the system to make global scheduling decisions and assign some applications to a slower device, we […]
Nov, 19

StreamMR: An Optimized MapReduce Framework for AMD GPUs

MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers. The success of MapReduce in cluster environments has motivated several studies of implementing MapReduce on a graphics processing unit (GPU), but generally focusing on the NVIDIA GPU. Our investigation reveals that the design and mapping of […]
Nov, 18

Auto-tunable GPU BLAS

OpenCL is fast becoming the preferred framework used to make programs for heterogeneous platforms consisting of at least one CPU and one or more accelerators. The GPU being readily available in almost all computers, it is the most common accelerator in use.Good libraries are important to reduce development time and to make particular development environments, […]
Nov, 17

Characterization and Transformation of Unstructured Control Flow in GPU Applications

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA and OpenCL. Although this technology is widely used, commodity GPUs use different schemes to implement it, and the performance limitations of these different schemes under real workloads […]
Nov, 17

Massive Image Editing on the Cloud

Processing massive imagery in a distributed environment currently requires the effort of a skilled team to efficiently handle communication, synchronization, faults, and data/process distribution. Moreover, these implementations are highly optimized for a specific system or cluster, therefore portability or improved performance due to system improvements is rarely considered. Much like early GPU computing, cluster computing […]
Nov, 17

Adaboost GPU-based Classifier for Direct Volume Rendering

In volume visualization, the voxel visibitity and materials are carried out through an interactive editing of Transfer Function. In this paper, we present a two-level GPU-based labeling method that computes in times of rendering a set of labeled structures using the Adaboost machine learning classifier. In a pre-processing step, Adaboost trains a binary classifier from […]
Nov, 16

Simulations of Large Particle Systems in Real Time

Simulation of interacting particle systems has been a well established method for many years now. Such systems can span different scales, including microscopic (where particles represent atoms, as in Molecular Dynamics simulations) as well as macroscopic. In the latter case, growing interest is put into Smoothed Particle Hydrodynamics approach. Traditionally, over many years, simulation of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: