Posts
Feb, 11
SPIRE, a Sequential to Parallel Intermediate Representation Extension
SPIRE is a new, generic, parallel extension for the intermediate representations used in compilation frameworks of sequential languages; it intends to easily leverage their existing infrastructure to address both control and data parallel languages. Since the efficiency and power of the transformations and optimizations performed by compilers are closely related to the presence of a […]
Feb, 11
Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages
Programming parallel machines as effectively as sequential ones would ideally require a language that provides high-level programming constructs in order to avoid the programming errors frequent when expressing parallelism. Since task parallelism is often considered more error-prone than data parallelism, we survey six popular and efficient parallel programming languages that tackle this difficult issue: Cilk, […]
Feb, 11
High-throughput protein crystallization on the World Community Grid and the GPU
We have developed CPU and GPU versions of an automated image analysis and classification system for protein crystallization trial images from the Hauptman Woodward Institute’s High-Throughput Screening lab. The analysis step computes 12,375 numerical features per image. Using these features, we have trained a classifier that distinguishes 11 different crystallization outcomes, recognizing 80% of all […]
Feb, 11
Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems
The recent use of graphics processing units (GPUs) in several top supercomputers demonstrate their ability to consistently deliver positive results in high-performance computing (HPC). GPU support for significant amounts of parallelism would seem to make them strong candidates for non-HPC applications as well. Server workloads are inherently parallel; however, at first glance they may not […]
Feb, 11
Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA
We evaluate a novel implementation of a Self-Organizing Map (SOM) on a Graphics Processing Unit (GPU) cluster. Using various combinations of OpenCL, CUDA, and two different graphics cards, we demonstrate the scalability of the SOM implementation on one to eight GPUs. Results indicate that while the algorithm scales well with the number of training samples […]
Feb, 10
Automatic Performance Optimization in ViennaCL for GPUs
Highly parallel computing architectures such as graphics processing units (GPUs) pose several new challenges for scientific computing, which have been absent on single core CPUs. However, a transition from existing serial code to parallel code for GPUs often requires a considerable amount of effort. The Vienna Computing Library (ViennaCL) presented in the beginning of this […]
Feb, 10
Customizing Instruction Set Extensible Reconfigurable Processors using GPUs
Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally […]
Feb, 10
Ensemble K-means on multi-core architectures
Ensemble problems uses multiple models generated from a data set to improve the correctness and ensure faster convergence. The use of multiple models makes ensemble problems computationally intensive. In this paper, we explore the parallelization of ensemble problems on modern multicore hardware like CPUs and GPUs. We use the K-means clustering algorithm as a case […]
Feb, 10
Implementing Molecular Dynamics on Hybrid High Performance Computers – Particle-Particle Particle-Mesh
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more […]
Feb, 10
Real-Time SAH BVH Construction for Ray Tracing Dynamic Scenes
This work is aimed at the development of effective algorithms for building of full SAH BVH trees on GPU in real-time. In this work it is presupposed that all the scene objects are represented by a number of triangles (the so-called "triangle soup"), at the same time the arbitrary changes in the geometry are allowed […]
Feb, 9
Accelerating H.264 Advanced Video Coding with GPU/CUDA Technology
With the rise of streaming media on the Internet and the YouTube revolution, the demand for online videos is costing companies a significant amount of bandwidth. To alleviate the resources needed for streaming media, video compression removes redundant information and minimizes the loss in quality experienced by a human audience. In response to the need […]
Feb, 9
Parallel Semi-Implicit Time Integrators
In this paper, we further develop a family of parallel time integrators known as Revisionist Integral Deferred Correction methods (RIDC) to allow for the semi-implicit solution of time dependent PDEs. Additionally, we show that our semi-implicit RIDC algorithm can harness the computational potential of multiple general purpose graphical processing units (GPGPUs) by utilizing existing CUBLAS […]