Posts
Aug, 2
Design of an FPGA-Based FDTD Accelerator Using OpenCL
High-performance computing systems with dedicated hardware on FPGAs can achieve power efficient computations compared with CPUs and GPUs. However, the hardware design on FPGAs needs more time than the software design on CPUs and GPUs. We designed an FDTD hardware accelerator using the OpenCL compiler for FPGAs in this paper. Since it is possible to […]
Aug, 2
An Analysis of OpenACC Programming Model: Image Processing Algorithms as a Case Study
Graphics processing units and similar accelerators have been intensively used in general purpose computations for several years. In the last decade, GPU architecture and organization changed dramatically to support an ever-increasing demand for computing power. Along with changes in hardware, novel programming models have been proposed, such as NVIDIA’s Compute Unified Device Architecture (CUDA) and […]
Aug, 2
Extracting Maximal Exact Matches on GPU
The revolution in high-throughput sequencing technologies accelerated the discovery and extraction of various genomic sequences. However, the massive size of the generated datasets raise several computational problems. For example, aligning the sequences or finding the similar regions in them, which is one of the crucial steps in many bioinformatics pipelines, is a time consuming task. […]
Aug, 2
Integrated Arrival and Departure Schedule Optimization Under Uncertainty
In terminal airspace, integrating arrivals and departures with shared waypoints provides the potential of improving operational efficiency by allowing direct routes when possible. Incorporating stochastic evaluation as a post-analysis process of deterministic optimization, and imposing a safety buffer in deterministic optimization, are two ways to learn and alleviate the impact of uncertainty and to avoid […]
Aug, 2
Accelerated Matrix Element Method with Parallel Computing
The matrix element method utilizes ab initio calculations of probability densities as powerful discriminants for processes of interest in experimental particle physics. The method has already been used successfully at previous and current collider experiments. However, the computational complexity of this method for final states with many particles and degrees of freedom sets it at […]
Aug, 1
CUDA Accelerated Entropy Constrained Vector Quantization and Multiple K-Means
Multi-trial sampled K-means performance and scalability is studied as a stepping stone towards a Graphical Processing Unit implementation of Entropy Constrained Vector Quantization for interactive data compression. Basic parallelization strategies and data layout impacts are explored with K-means. The K-means implementation is extended to Entropy Constrained Vector Quantization, and additional tuning specific to the anticipated […]
Aug, 1
Scalable and High Performance Betweenness Centrality on the GPU
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high […]
Aug, 1
A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems
Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new […]
Aug, 1
Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects
The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per […]
Aug, 1
Discriminative Convolutional Sum-Product Networks on GPU
Sum-Product Networks (SPNs) are a deep architecture recently proposed for image classification and modeling. In contrast to loopy graphical models commonly used in computer vision, exact inference and learning in SPNs is tractable. As long as consistency and completeness are ensured, an SPN allows to efficiently calculate the partition function and all marginals of graphical […]
Jul, 30
Automatic Parallelization of Tiled Stencil Loop Nests on GPUs
This thesis attempts to design and implement a compiler framework based on the polyhedral model. The compiler automatically parallelizes loop nests; especially stencil kernels, into efficient GPU code by loop tiling transformations which the polyhedral model describes. To enhance parallel performance, we introduce three practically efficient techniques to process different types of loop nests. The […]
Jul, 30
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the […]