Posts
Feb, 15
High-Performance Graphics 2014
High-Performance Graphics is the leading international forum for performance-oriented graphics and imaging systems research including innovative algorithms, efficient implementations, languages, parallelism, compilers, parallelism, hardware and architectures for high-performance graphics. High-Performance Graphics was founded in 2009 to synthesize and broaden two important and well-respected conferences in computer graphics: Graphics Hardware and Interactive Ray Tracing. The conference […]
Feb, 15
Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
Finding the shortest paths from a single source to all other vertices is a fundamental method used in a variety of higher-level graph algorithms. We present three parallel-friendly and work-efficient methods to solve this Single-Source Shortest Paths (SSSP) problem: Workfront Sweep, Near-Far and Bucketing. These methods choose different approaches to balance the tradeoff between saving […]
Feb, 15
GPGPUs: How to Combine High Computational Power with High Reliability
GPGPUs are increasingly used in several domains, from gaming to different kinds of computationally intensive applications. In some cases, their reliability is becoming a serious issue and several research activities are focusing on its evaluation. This paper aims at overviewing some major results in the area. First, it shows and analyzes the results of some […]
Feb, 15
Computation of Galois field expressions for quaternary logic functions on GPUs
Galois field (GF) expressions are polynomials used as representations of multiple-valued logic (MVL) functions. For this purpose, MVL functions are considered as functions defined over a finite (Galois) field of order p – GF(p). The problem of computing these functional expressions has an important role in areas such as digital signal processing and logic design. […]
Feb, 15
Optimizing exact computation of Betweenness Centrality for CUDA
Betweenness centrality is an important metric in the study of network analysis. This report discusses the problem of exact computation of betweenness cenrality index in network analysis. BC is an important metric in small world network analysis which is expensive to compute. A new strategy is presented to parallelize the best known serial algorithm for […]
Feb, 15
Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to identify factors that influence […]
Feb, 14
Effective Multi-Modal Retrieval based on Stacked Auto-Encoders
Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of media. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of mapping functions are learned to project high-dimensional features extracted from data of different media […]
Feb, 14
High-Performance Zonal Histogramming on Large-Scale Geospatial Rasters Using GPUs and GPU-Accelerated Clusters
Hardware Accelerators are playing increasingly important roles in achieving desired performance from desktop to cluster computing. While General Purpose computing on Graphics Processing Units (GPGPU) technologies have been widely applied to computing intensive applications, there are relatively little work on using GPUs and GPU-accelerated clusters for data intensive computing that typically involves significant irregular data […]
Feb, 14
High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs
City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics […]
Feb, 14
Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the […]
Feb, 14
Multi-Kepler GPU vs. Multi-Intel MIC for spin systems simulations
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the […]
Feb, 12
Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance
Developing high performance GPU code is labor intensive. Ideally, developers could recoup high GPU development costs by generating high-performance programs for CPUs and other architectures from the same source code. However, current OpenCL compilers for non-GPUs do not fully exploit optimizations in well-tuned GPU codes. To address this problem, we develop an OpenCL implementation that […]