7619

Posts

May, 2

Dynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems

With their high computation throughput and outstanding performance-per-watt figures, the graphics processing units (GPU) are becoming increasingly important for high-performance computing (HPC) systems. Existing GPU execution environment restricts the GPU usage to local host node. This is suitable for standalone computer nodes, but becomes inefficient for HPC systems that consist of a large number of […]
May, 2

Diderot: A Parallel DSL for Image Analysis and Visualization

Research scientists and medical professionals use imaging technology, such as computed tomography (CT) and magnetic resonance imaging (MRI) to measure a wide variety of biological and physical objects. The increasing sophistication of imaging technology creates demand for equally sophisticated computational techniques to analyze and visualize the image data. Analysis and visualization codes are often crafted […]
May, 2

GPU Acceleration for the C++ Standard Template Library

Modern programmers must exploit parallelism for performance gains, possibly through the use of an attached or on-chip GPU. To take advantage of the GPU in C++ programs, the programmer must use either a new language (CUDA or OpenCL) or an external library (Thrust). Rather than requiring that programmers learn new tools, modify existing code, and […]
May, 2

Automatic NUMA Characterization using Cbench

Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access costs when accessing non-local data. In this work-in-progress paper, we present extensions to the Cbench Scalable Testing Framework for analyzing main memory […]
May, 1

OpenCL and the 13 Dwarfs: A Work in Progress

In the past, evaluating the architectural innovation of parallel computing devices relied on a benchmark suite based on existing programs, e.g., EEMBC or SPEC. However, with the growing ubiquity of parallel computing devices, we argue that it is unclear how best to express parallel computation, and hence, a need exists to identify a higher level […]
May, 1

Graphics Processing Unit Audio Signals Processing in Pure Data and PdCUDA an Implementation with the CUDA Runtime API

The design of graphics processing unit (GPU) audio signals processing extensions to Pure Data (Pd) is discussed with attention to future growth in GPU computing and the complexity of programming a general solution. An implementation named PdCUDA is presented for use of GPU general programming capability for audio signals processing with Pd and the CUDA […]
May, 1

Optimized GPU simulation of continuous-spin glass models

We develop a highly optimized code for simulating the Edwards-Anderson Heisenberg model on graphics processing units (GPUs). Using a number of computational tricks such as tiling, data compression and appropriate memory layouts, the simulation code combining over-relaxation, heat bath and parallel tempering moves achieves a peak performance of 0.29 ns per spin update on realistic […]
May, 1

Random number generators for massively parallel simulations on GPU

High-performance streams of (pseudo) random numbers are crucial for the efficient implementation for countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA supporting libraries. Nevertheless, […]
May, 1

Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology

In this work, we have explored the advantages and drawbacks of using GPUs instead of CPUs in the calculation of a standard 2-point correlation function algorithm, which is useful for the analysis of Large Scale Structure of galaxies. Taking into account the huge volume of data foreseen in upcoming surveys, our main goal has been […]
Apr, 28

Solving Stochastic Differential Equations Using General Purpose Graphics Processing Unit

Stochastic Differential Equations are important in many models of various physical or artificial phenomena. To get meaningful results it is desirable to solve the initial value numerical integration problem for a sufficiently large ensemble of realizations. Each element of the ensemble has the same form, thus exposing inherent data-parallelism. We implemented a cross-platform library written […]
Apr, 28

Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions

We recast the Cosegmentation problem using Random Walker (RW) segmentation as the core segmentation algorithm, rather than the traditional MRF approach adopted in the literature so far. Our formulation is similar to previous approaches in the sense that it also permits Cosegmentation constraints (which impose consistency between the extracted objects from >= 2 images) using […]
Apr, 28

High-Performance Code Generation for Stencil Computations on GPU Architectures

Stencil computations arise in many scientific computing domains, and often represent time-critical portions of applications. There is significant interest in offloading these computations to high-performance devices such as GPU accelerators, but these architectures offer challenges for developers and compilers alike. Stencil computations in particular require careful attention to off-chip memory access and the balancing of […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: