high performance computing on graphics processing units: hgpu.org

Posts

Nov, 10

A CPU-GPU Hybrid Runtime for the Aeminium Language

Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much […]

OpenCL

Nov, 10

Bit-Parallel Multiple Pattern Matching

Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities […]

OpenCL

Nov, 9

GrAVity: a massively parallel antivirus engine

In the ongoing arms race against malware, antivirus software is at the forefront, as one of the most important defense tools in our arsenal. Antivirus software is flexible enough to be deployed from regular users desktops, to corporate e-mail proxies and file servers. Unfortunately, the signatures necessary to detect incoming malware number in the tens […]

CUDA

Nov, 9

Parallel Implementation of Otsu’s Binarization Approach on GPU

Fast algorithms are important for efficient image processing systems for handling large set of calculations. To speedup the processing, parallel implementation of an algorithm can be done using Graphics Processing Unit (GPU). GPU is general purpose computation hardware; programmability and low cost make it productive. Binarization is widely used technique in the image analysis and […]

CUDA

Nov, 9

Parallel Implementation of Souvola’s Binarization Approach on GPU

Binarization is widely used technique in many of the image processing applications. Fast algorithms are needed for fast and efficient image processing systems. Many algorithms of image processing and pattern recognition have recently been implemented on Graphic Processing Unit (GPU) for faster computational times. GPUs are most prominent hardware in utilizing parallelism and pipelining than […]

CUDA

Nov, 9

Low Complexity Corner Detector Using CUDA for Multimedia Applications

High speed feature detection is a requirement for many real-time multimedia and computer vision applications. In previous work, the Harris and KLT algorithms were redesigned to increase the performance by reducing the algorithmic complexity, resulting in the Low Complexity Corner Detector algorithm. To attain further speedup, this paper proposes the implementation of this low complexity […]

CUDA

Nov, 9

Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters

Neighborhood denoising filters are powerful techniques in image processing and can effectively enhance the image quality in CT reconstructions. In this study, by taking the bilateral filter and the non-local mean filter as two examples, we discuss their implementations and perform fine-tuning on the targeted GPU architecture. Experimental results show that the straightforward GPU-based neighborhood […]

CUDA

Nov, 9

Effects of GPU and CPU Loads on Performance of CUDA Applications

General purpose computing on GPUs provides a way for certain applications to benefit from a commonly available massively parallel architecture. As such deployment becomes more widespread, multiple GPU applications will have to execute on the same hardware in systems that have only one GPU. The aggregate loads of the GPU and CPU impact the performance […]

CUDA

Nov, 9

Large data visualization on distributed memory multi-GPU clusters

Data sets of immense size are regularly generated on large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualized on standard workstations is now commonplace. One solution to this problem is to employ a ‘visualization cluster,’ […]

OpenGL

Nov, 9

Automatic transformation and optimization of applications on GPUs and GPU clusters

Modern accelerators and multi-core architectures offer significant computing power at a very modest cost. With this trend, an important research issue at the software end is how to make the best use of these computing devices, and how to enable high performance without the users having to put too much effort into learning the architecture […]

CUDA

Nov, 9

GPU-based ray casting of stacked out-of-core height fields

We developed a ray casting-based rendering system for the visualization of geological subsurface models consisting of multiple highly detailed height fields. Based on a shared out-of-core data management system, we virtualize the access to the height fields, allowing us to treat the individual surfaces at different local levels of detail. The visualization of an entire […]

OpenCL

•

OpenGL

Nov, 9

Parallel Implementation of Niblack’s Binarization Approach on CUDA

Image processing and pattern recognition algorithms take more time for execution on a single core processor. Graphics Processing Unit (GPU) is more popular now-a-days due to their speed, programmability, low cost and more inbuilt execution cores in it. Most of the researchers started work to use GPUs as a processing unit with a single core […]

CUDA