Posts
Feb, 4
Lattice Based Volumetric Global Illumination
We describe a novel volumetric global illumination framework based on the face-centered cubic (FCC) lattice. An FCC lattice has important advantages over a Cartesian lattice. It has higher packing density in the frequency domain, which translates to better sampling efficiency. Furthermore, it has the maximal possible kissing number (equivalent to the number of nearest neighbors […]
Feb, 4
QUDA programming for staggered quarks
We have been extending the QUDA GPU code developed at Boston University to include the case of improved staggered quarks. Improved staggered quarks such as asqtad and HISQ require both first and third nearest neighbor terms in the Dirac operator. We call the corresponding links fatlinks and longlinks. The fatlinks are not unitary, and staggered […]
Feb, 4
Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors
Execution of applications on upcoming high-performance computing (HPC) systems introduces a variety of new challenges and amplifies many existing ones. These systems will be composed of a large number of ldquofatrdquo nodes, where each node consists of multiple processors on a chip with symmetric multithreading capabilities, interconnected via high-performance networks. Traditional system software for parallel […]
Feb, 4
QP: A Heterogeneous Multi-Accelerator Cluster
We present a heterogeneous multi-accelerator cluster developed and deployed at NCSA. The cluster consists of 16 AMD dual-core CPU compute nodes each with four NVIDIA GPUs and one Xilinx FPGA. Cluster nodes are interconnected with both InfiniBand and Ethernet networks. The software stack consists of standard cluster tools with the addition of accelerator-specific software packages […]
Feb, 3
On testing GPU memory for hard and soft errors
NVIDIA GPUs are becoming increasingly popular in scientific computation as a way to accelerate the execution of computationally demanding codes. The graphics memory used in GPUs is not protected against soft errors that may be caused by cosmic radiation and thus is a source of concern for the scientific computing community. In this short paper […]
Feb, 3
Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters
We present an inexpensive hardware system for monitoring power usage of individual CPU hosts and externally attached GPUs in HPC clusters and the software stack for integrating the power usage data streamed in real-time by the power monitoring hardware with the cluster management software tools. We introduce a measure for quantifying the overall improvement in […]
Feb, 3
MILC on GPUs
The MIMD Lattice Computation (MILC) code, a Quantum Chromodynamics (QCD) application used to simulate four-dimensional SU(3) lattice gauge theory, is one of the largest compute cycle users at many supercomputing centers. Previously we have investigated how one of MILC applications can be accelerated on the Cell Broadband Engine. We currently investigate how this code can […]
Feb, 3
3I: A tool for visualizing and processing in parallel 2D & 3D images
We present a tool for intensive processing of digital images based on graphics processing units (GPUs) and multi-core CPU. The tool incorporates innovative filters for the denoising and estimation of missing information in three-dimensional digital images. Both processes are integrated into a pipeline that repeatedly evaluates the image until a given convergence. Finally, 3D images […]
Feb, 3
3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation
Medical image registration is time-consuming but can be sped up employing parallel processing on the GPU. Normalized mutual information (NMI) is a well performing similarity measure for performing multi-modal registration. We present CUDA based solutions for computing NMI on the GPU and compare the results obtained by rigidly registering multi-modal data sets with a CPU […]
Feb, 3
3D Information Extraction Based on GPU
Our project starts from a practical specific application of stereo vision (matching) on a robot arm, which is first building up a vision system for a robot arm to make it obtain the capability of detecting the objects 3D information on a plane. The kernel of the vision system is stereo matching. Stereo matching(correspondence) problem […]
Feb, 3
3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis
Graphics Processing Units (GPUs) offer tremendous computational and processing power. The architecture requires high communication bandwidth and lower latency between computation units and caches. 3D die-stacking technology is a promising approach to meet such requirements. To the best of our knowledge no other study has investigated the implementation of 3D technology in GPUs. In this […]
Feb, 3
3D finite element numerical integration on GPUs
The algorithmic and computational aspects of 3D finite element numerical integration on GPUs are investigated in the paper. The special stress is put on selecting the proper parallelization strategies depending upon the properties of FEM problems solved and approximations used. The close interplay between the available computational resources of GPUs and the possible implementation strategies […]