Posts
Feb, 13
Work Stealing Inside GPUs
Graphics Processing units have become a valuable support for High Performance Computing (HPC) applications. However, despite the many improvements on the General Purpose GPU, there is still the need of a generic programming model adaptable to the many forms of parallelism that an application can express. The CUDA programming model is widely used on the […]
Feb, 13
LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses
Today, computer simulations form an integral part of many research and development efforts. The scope of what can be modeled has increased dramatically, as computing performance improved over the last two decades. But with serial-execution performance of CPUs leveling off, future performance increases for computational physics, material design, and biology must come from higher parallelization. […]
Feb, 13
Analytic Anti-Aliasing of Linear Functions on Polytopes
This paper presents an analytic formulation for anti-aliased sampling of 2D polygons and 3D polyhedra. Our framework allows the exact evaluation of the convolution integral with a linear function defined on the polytopes. The filter is a spherically symmetric polynomial of any order, supporting approximations to refined variants such as the Mitchell-Netravali filter family. This […]
Feb, 12
Recursive MIS Computation for Streaming BDPT on the GPU
Bidirectional Path Tracing (BDPT) is a robust unbiased rendering algorithm that samples paths by connecting eye and light paths. By optimally combining different sampling strategies using Multiple Importance Sampling (MIS), BDPT efficiently renders scenes with complex light effects. However, BDPT does not map well on a streaming architecture such as the GPU; Stochastic path lengths […]
Feb, 12
Level Sets and Voronoi based Feature Extraction from any Imagery
Polygon features are of interest in many GEOProcessing applications like shoreline mapping, boundary delineation, change detection, etc. This paper presents a unique new GPU-based methodology to automate feature extraction combining level sets, or mean shift based segmentation together with Voronoi skeletonization, that guarantees the extracted features to be topologically correct. The features thus extracted as […]
Feb, 12
FPGA accelerated 3D reconstruction using compressive sensing
The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an FPGA implementation of […]
Feb, 12
Fast Polynomial Approximation Acceleration on the GPU
This article presents the possibility of parallelization of calculating polynomial approximations with large data inputs on GPU using NVIDIA CUDA architecture. Parallel implementation on the GPU is compared to the single thread CPU implementation. Despite the enormous computing power of today’s graphics cards there is still a problem with the speed of data transfer to […]
Feb, 12
Face Detection CUDA Accelerating
Face detection is very useful and important for many different disciplines. Even for our future work, where the face detection will be used, we wanted to determine, whether it is advantageous to use the technology CUDA for detection faces. First, we implemented the Viola and Jones algorithm in the basic one-thread CPU version. Then the […]
Feb, 11
SPIRE, a Sequential to Parallel Intermediate Representation Extension
SPIRE is a new, generic, parallel extension for the intermediate representations used in compilation frameworks of sequential languages; it intends to easily leverage their existing infrastructure to address both control and data parallel languages. Since the efficiency and power of the transformations and optimizations performed by compilers are closely related to the presence of a […]
Feb, 11
Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages
Programming parallel machines as effectively as sequential ones would ideally require a language that provides high-level programming constructs in order to avoid the programming errors frequent when expressing parallelism. Since task parallelism is often considered more error-prone than data parallelism, we survey six popular and efficient parallel programming languages that tackle this difficult issue: Cilk, […]
Feb, 11
High-throughput protein crystallization on the World Community Grid and the GPU
We have developed CPU and GPU versions of an automated image analysis and classification system for protein crystallization trial images from the Hauptman Woodward Institute’s High-Throughput Screening lab. The analysis step computes 12,375 numerical features per image. Using these features, we have trained a classifier that distinguishes 11 different crystallization outcomes, recognizing 80% of all […]
Feb, 11
Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems
The recent use of graphics processing units (GPUs) in several top supercomputers demonstrate their ability to consistently deliver positive results in high-performance computing (HPC). GPU support for significant amounts of parallelism would seem to make them strong candidates for non-HPC applications as well. Server workloads are inherently parallel; however, at first glance they may not […]