9152

Posts

Mar, 21

Efficient GPU implementation of the integral histogram

The integral histogram for images is an efficient preprocessing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmentation. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU […]
Mar, 21

Performance study of filtered back-projection algorithms implemented on GPUs

In recent years the use of graphical processing units (GPUs) in the diverse fields of science has increase dramatically. This increase is not only due to the GPU tremendous computational power, but also because they are relatively cheap when compared to clusters. In this work we explore the use of the GPU to reduce the […]
Mar, 21

GPGPU Test Suite Minimisation: Search Based Software Engineering Performance Improvement Using Graphics Cards

It has often been claimed that SBSE uses so-called "embarrassingly parallel" algorithms that will imbue SBSE applications with easy routes to dramatic performance improvements. However, despite recent advances in multicore computation, this claim remains largely theoretical; there are few reports of performance improvements using multicore SBSE. This paper shows how inexpensive General Purpose computing on […]
Mar, 21

Duplicate Detection on GPUs

With the ever increasing volume of data and the ability to integrate different data sources, data quality problems abound. Duplicate detection, as an integral part of data cleansing, is essential in modern information systems. We present a complete duplicate detection workflow that utilizes the capabilities of modern graphics processing units (GPUs) to increase the efficiency […]
Mar, 21

Stream Join Processing on Heterogeneous Processors

The window-based stream join is an important operator in all data streaming systems. It has often high resource requirements so that many efficient sequential as well as parallel versions of it were proposed in the literature. The parallel stream join operators recently gain increasing interest because hardware is getting more and more parallel. Most of […]
Mar, 20

Symbolic Crosschecking of Data-Parallel Floating Point Code

In this thesis we present a symbolic execution-based technique for cross-checking programs accelerated using SIMD or OpenCL against an unaccelerated version, as well as a technique for detecting data races in OpenCL programs. Our techniques are implemented in KLEE-CL, a symbolic execution engine based on KLEE that supports symbolic reasoning on the equivalence between expressions […]
Mar, 20

A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems

This paper presents a variation of Evolutionary Particle Swarm Optimization applied to the concept of master/slave swarm with mechanism of sharing data for the acceleration of convergence. The implementation called Cooperative Evolutionary MultiSwarm Optimization on Graphics Processing Units (CMEPSOGPU) consists in using thousands of threads in various slave swarms on the CUDA parallel architecture, where […]
Mar, 20

Multi-GPU Island-Based Genetic Algorithm

Genetic algorithms are effective in solving many optimization tasks. However, the long execution time associated with it prevents its use in many domains. In this paper, we propose a new approach for parallel implementation of genetic algorithm on graphics processing units (GPUs) using CUDA programming model. This paper introduces a novel implementation of the genetic […]
Mar, 20

Time-stepping methods for the simulation of the self-assembly of nano-crystals in Matlab on a GPU

Partial differential equations describing the patterning of thin crystalline films are typically of fourth or sixth order, they are quasi- or semilinear and they are mostly defined on simple geometries such as rectangular domains. For the numerical simulation of these kind of problems spectral methods are an efficient approach. We apply several implicit-explicit schemes to […]
Mar, 20

General Purpose Computing on Low-Power Embedded GPUs: Has It Come of Age?

In this paper we evaluate the promise held by lowpower GPUs for non-graphic workloads that arise in embedded systems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their […]
Mar, 18

clMAGMA: High Performance Dense Linear Algebra with OpenCL

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms in OpenCL. In particular, these are linear system solvers and eigenvalue problem solvers. Further, we give an overview of the clMAGMA library, an open source, high performance OpenCL library that incorporates the developments presented, and in general provides to heterogeneous […]
Mar, 18

Volume Raycasting Performance Using DirectCompute

Volume rendering is quite an old concept of representing images, dating back to the 1980’s. It is very useful in the medical field for visualizing the results of a computer tomography (CT) and magnet resonance tomography (MRT) in 3D. Apart from these two major applications for volume rendering, there aren’t many other fields of usage […]

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: