5287

Posts

Aug, 18

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, […]
Aug, 18

Analyzing program flow within a many-kernel OpenCL application

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify application bottlenecks. In this paper, we analyze and […]
Aug, 17

Near real-time Fast Bilateral Stereo on the GPU

State of the art local stereo correspondence algorithms that adapt their supports to image content allow to infer very accurate disparity maps often comparable to algorithms based on global disparity optimization methods. However, despite their effectiveness, accurate local approaches based on this methodology are also computationally expensive and several simplifications aimed at reducing their computational […]
Aug, 17

Fast boosting trees for classification, pose detection, and boundary detection on a GPU

Discriminative classifiers are often the computational bottleneck in medical imaging applications such as foreground/background classification, 3D pose detection, and boundary delineation. To overcome this bottleneck, we propose a fast technique based on boosting tree classifiers adapted for GPU computation. Unlike standard tree-based algorithms, our method does not have any recursive calls which makes it GPU-friendly. […]
Aug, 17

GPU-based reconstruction and display for 4D ultrasound data

Due to the required computational effort of 4D ultrasound imaging, such systems depend on low complexity techniques like nearest neighbor interpolation, which affects volume quality. Moreover, more accurate techniques like normalized convolution, backward trilinear interpolation, and forward spherical and ellipsoidal Gaussian kernel, are avoided in real-time imaging because of the tight reconstruction time. The goal […]
Aug, 17

nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications

We present nGFSIM, a GPU-based fault simulator for stuck-at faults which can report the fault coverage of one-to n-detection for any specified integer n using only a single run of fault simulation. nGFSIM, which explores the massive parallelism in the GPU architecture and optimizes the memory access and usage, enables accelerated fault simulation without the […]
Aug, 17

GPU accelerated FDTD solver and its application in MRI

The finite difference time domain (FDTD) method is a popular technique for computational electromagnetics (CEM). The large computational power often required, however, has been a limiting factor for its applications. In this paper, we will present a graphics processing unit (GPU)-based parallel FDTD solver and its successful application to the investigation of a novel B1 […]
Aug, 17

Realtime Ray Tracing on GPU with BVH-based Packet Traversal

Recent GPU ray tracers can already achieve performance competitive to that of their CPU counterparts. Nevertheless, these systems can not yet fully exploit the capabilities of modern GPUs and can only handle medium-sized, static scenes. In this paper we present a BVH-based GPU ray tracer with a parallel packet traversal algorithm using a shared stack. […]
Aug, 17

CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU

Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Due to GPU’s tremendous computing capability, it has emerged as the co-processor of the CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of […]
Aug, 17

High Throughput Variable Size Non-square Gabor Engine with Feature Pooling Based on GPU

Increasing application of Gabor feature space in various computer vision tasks and its high computational demand, encourages using parallel computing technologies. In this work we have designed a high throughput GPU based Gabor kernel that mimics the function of initial biological visual cortex layers namely ‘Simple’ and ‘Complex’ cells. The kernel is basically a Gabor […]
Aug, 17

Robotic approach to multi-beam optical tweezers with Computer Generated Hologram

Multi-beam optical tweezers is important technique to manipulate multiple small objects. Computer Generated Hologram (CGH) is one of the techniques and it can trap more than 200 objects in three dimension. For dexterous micromanipulation, it is useful to apply robotics into optical tweezers. In this research, we designed the optical system and control system of […]
Aug, 17

Regular Expression Matching and Operational Semantics

Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: