high performance computing on graphics processing units: hgpu.org

Posts

Aug, 2

Integrated Arrival and Departure Schedule Optimization Under Uncertainty

In terminal airspace, integrating arrivals and departures with shared waypoints provides the potential of improving operational efficiency by allowing direct routes when possible. Incorporating stochastic evaluation as a post-analysis process of deterministic optimization, and imposing a safety buffer in deterministic optimization, are two ways to learn and alleviate the impact of uncertainty and to avoid […]

CUDA

Aug, 2

Accelerated Matrix Element Method with Parallel Computing

The matrix element method utilizes ab initio calculations of probability densities as powerful discriminants for processes of interest in experimental particle physics. The method has already been used successfully at previous and current collider experiments. However, the computational complexity of this method for final states with many particles and degrees of freedom sets it at […]

OpenCL

Aug, 1

CUDA Accelerated Entropy Constrained Vector Quantization and Multiple K-Means

Multi-trial sampled K-means performance and scalability is studied as a stepping stone towards a Graphical Processing Unit implementation of Entropy Constrained Vector Quantization for interactive data compression. Basic parallelization strategies and data layout impacts are explored with K-means. The K-means implementation is extended to Entropy Constrained Vector Quantization, and additional tuning specific to the anticipated […]

CUDA

Aug, 1

Scalable and High Performance Betweenness Centrality on the GPU

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high […]

CUDA

Aug, 1

A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems

Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new […]

CUDA

Aug, 1

Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects

The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per […]

CUDA

Aug, 1

Discriminative Convolutional Sum-Product Networks on GPU

Sum-Product Networks (SPNs) are a deep architecture recently proposed for image classification and modeling. In contrast to loopy graphical models commonly used in computer vision, exact inference and learning in SPNs is tractable. As long as consistency and completeness are ensured, an SPN allows to efficiently calculate the partition function and all marginals of graphical […]

CUDA

Jul, 30

Automatic Parallelization of Tiled Stencil Loop Nests on GPUs

This thesis attempts to design and implement a compiler framework based on the polyhedral model. The compiler automatically parallelizes loop nests; especially stencil kernels, into efficient GPU code by loop tiling transformations which the polyhedral model describes. To enhance parallel performance, we introduce three practically efficient techniques to process different types of loop nests. The […]

CUDA

Jul, 30

Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems

Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the […]

OpenCL

Jul, 30

Scaling Multifluid Compressible Fluid Dynamics to 700,000 cores, 1.5 Pflop/s, and a Trillion Grid Cells

We are using the Blue Waters system at NCSA to study compressible, turbulent mixing of gases in the deep interiors of stars and also in the context of inertial confinement fusion (ICF). In December, 2012, during the Blue Waters friendly user access period, we carried out a simulation of an ICF test problem on a […]

Jul, 30

Research on Parallel DVH Statistic Based on CUDA

Dose Volume Histogram(DVH) is necessary for evaluating radiotherapy planning. With the increase of patient CT slices and the development of intensity-modulated radiation therapy(IMRT) technology, statistical process of DVH requires a large number of cubic interpolation calculation, and the sequential single threaded DVH code on the CPU can not meet the real-time requirement. The paper presents […]

CUDA

Jul, 30

A CUDA-Based Real Parameter Optimization Benchmark

Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC’14.

CUDA