7243

Posts

Feb, 12

Recursive MIS Computation for Streaming BDPT on the GPU

Bidirectional Path Tracing (BDPT) is a robust unbiased rendering algorithm that samples paths by connecting eye and light paths. By optimally combining different sampling strategies using Multiple Importance Sampling (MIS), BDPT efficiently renders scenes with complex light effects. However, BDPT does not map well on a streaming architecture such as the GPU; Stochastic path lengths […]
Feb, 12

Level Sets and Voronoi based Feature Extraction from any Imagery

Polygon features are of interest in many GEOProcessing applications like shoreline mapping, boundary delineation, change detection, etc. This paper presents a unique new GPU-based methodology to automate feature extraction combining level sets, or mean shift based segmentation together with Voronoi skeletonization, that guarantees the extracted features to be topologically correct. The features thus extracted as […]
Feb, 12

FPGA accelerated 3D reconstruction using compressive sensing

The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an FPGA implementation of […]
Feb, 12

Fast Polynomial Approximation Acceleration on the GPU

This article presents the possibility of parallelization of calculating polynomial approximations with large data inputs on GPU using NVIDIA CUDA architecture. Parallel implementation on the GPU is compared to the single thread CPU implementation. Despite the enormous computing power of today’s graphics cards there is still a problem with the speed of data transfer to […]
Feb, 12

Face Detection CUDA Accelerating

Face detection is very useful and important for many different disciplines. Even for our future work, where the face detection will be used, we wanted to determine, whether it is advantageous to use the technology CUDA for detection faces. First, we implemented the Viola and Jones algorithm in the basic one-thread CPU version. Then the […]
Feb, 11

SPIRE, a Sequential to Parallel Intermediate Representation Extension

SPIRE is a new, generic, parallel extension for the intermediate representations used in compilation frameworks of sequential languages; it intends to easily leverage their existing infrastructure to address both control and data parallel languages. Since the efficiency and power of the transformations and optimizations performed by compilers are closely related to the presence of a […]
Feb, 11

Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages

Programming parallel machines as effectively as sequential ones would ideally require a language that provides high-level programming constructs in order to avoid the programming errors frequent when expressing parallelism. Since task parallelism is often considered more error-prone than data parallelism, we survey six popular and efficient parallel programming languages that tackle this difficult issue: Cilk, […]
Feb, 11

High-throughput protein crystallization on the World Community Grid and the GPU

We have developed CPU and GPU versions of an automated image analysis and classification system for protein crystallization trial images from the Hauptman Woodward Institute’s High-Throughput Screening lab. The analysis step computes 12,375 numerical features per image. Using these features, we have trained a classifier that distinguishes 11 different crystallization outcomes, recognizing 80% of all […]
Feb, 11

Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems

The recent use of graphics processing units (GPUs) in several top supercomputers demonstrate their ability to consistently deliver positive results in high-performance computing (HPC). GPU support for significant amounts of parallelism would seem to make them strong candidates for non-HPC applications as well. Server workloads are inherently parallel; however, at first glance they may not […]
Feb, 11

Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA

We evaluate a novel implementation of a Self-Organizing Map (SOM) on a Graphics Processing Unit (GPU) cluster. Using various combinations of OpenCL, CUDA, and two different graphics cards, we demonstrate the scalability of the SOM implementation on one to eight GPUs. Results indicate that while the algorithm scales well with the number of training samples […]
Feb, 10

Automatic Performance Optimization in ViennaCL for GPUs

Highly parallel computing architectures such as graphics processing units (GPUs) pose several new challenges for scientific computing, which have been absent on single core CPUs. However, a transition from existing serial code to parallel code for GPUs often requires a considerable amount of effort. The Vienna Computing Library (ViennaCL) presented in the beginning of this […]
Feb, 10

Customizing Instruction Set Extensible Reconfigurable Processors using GPUs

Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: