high performance computing on graphics processing units: hgpu.org

Posts

Sep, 28

Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs

The race for Exascale computing has naturally led the current technologies to converge to multi-CPU/multi-GPU computers, based on thousands of CPUs and GPUs interconnected by PCI-Express buses or interconnection networks. To exploit this high computing power, programmers have to solve the issue of scheduling parallel programs on hybrid architectures. And, since the performance of a […]

CUDA

Sep, 28

A fast Texture-by-numbers synthesis method based on texture optimization

The framework of Texture-by-numbers (TBN) synthesizes images of global-varying patterns with intuitive user control. Previous TBN synthesis methods have difficulties in achieving high-quality synthesis results and efficiency simultaneously. This paper proposes a fast TBN synthesis method based on texture optimization, which uses global optimization to solve the controllable non-homogeneous texture synthesis problem. Our algorithm produces […]

CUDA

Sep, 28

Parallel Execution of Constraint Handling Rules on a Graphical Processing Unit

Graphical Processing Units (GPUs) consist of hundreds of small cores, collectively operating to provide massive computation capabilities. The aim of this work is to utilize this technology to execute Constraint Handling Rules (CHR) which are inherently parallel. A translation scheme is defined to transform a subset of CHR rules to C/C++, then to use a […]

CUDA

Sep, 27

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Modern graphics processing units offer general purpose parallel computing capabilities. Thus they have become a relatively low cost alternative for applications requiring extensive parallel computations. Evolutionary algorithms are especially well suited for parallel SIMD architecture. This paper deals with the modification of AllToAll variation of self-organizing migration algorithm, which has high computational demand for one […]

CUDA

Sep, 27

Deterministic Parallelism

A program is deterministic if it always produces the same output for a given input. Although sequential programs are often deterministic by default, parallel programs are more susceptible to behaving nondeterministically because instructions from different threads can be interleaved unpredictably. Non-determinism complicates the task of developing and maintaining software because it makes reasoning about program […]

CUDA

Sep, 27

GPU-based tuning of quantum-inspired genetic algorithm for a combinatorial optimization problem

This paper concerns efficient parameters tuning (meta-optimization) of a state-of-the-art metaheuristic, Quantum-Inspired Genetic Algorithm (QIGA), in a GPU-based massively parallel computing environment (NVidia CUDA technology). A novel approach to parallel implementation of the algorithm has been presented. In a block of threads, each thread transforms a separate quantum individual or different quantum gene; In each […]

CUDA

Sep, 27

Lattice QCD based on OpenCL

We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation […]

OpenCL

Sep, 27

GPU Acceleration of Image Convolution using Spatially-varying Kernel

Image subtraction in astronomy is a tool for transient object discovery such as asteroids, extra-solar planets and supernovae. To match point spread functions (PSFs) between images of the same field taken at different times a convolution technique is used. Particularly suitable for large-scale images is a computationally intensive spatially-varying kernel. The underlying algorithm is inherently […]

CUDA

Sep, 26

Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU

We present new format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector […]

CUDA

Sep, 26

GPU Shape Grammars

GPU Shape Grammars provide a solution for interactive procedural generation, tuning and visualization of massive environment elements for both video games and production rendering. Our technique generates detailed models without explicit geometry storage. To this end we reformulate the grammar expansion for generation of detailed models at the tesselation control and geometry shader stages. Using […]

Sep, 26

Enabling Development of OpenCL Applications on FPGA platforms

FPGAs can potentially deliver tremendous acceleration in high-performance server and embedded computing applications. Whether used to augment a processor or as a stand-alone device, these reconfigurable architectures are being deployed in a large number of implementations owing to the massive amounts of parallelism offered. At the same time, a significant challenge encountered in their wide-spread […]

OpenCL

Sep, 26

A Parallel Auxiliary Grid AMG Method for GPU

In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control […]

CUDA