2871

Posts

Feb, 4

Lattice Based Volumetric Global Illumination

We describe a novel volumetric global illumination framework based on the face-centered cubic (FCC) lattice. An FCC lattice has important advantages over a Cartesian lattice. It has higher packing density in the frequency domain, which translates to better sampling efficiency. Furthermore, it has the maximal possible kissing number (equivalent to the number of nearest neighbors […]
Feb, 4

QUDA programming for staggered quarks

We have been extending the QUDA GPU code developed at Boston University to include the case of improved staggered quarks. Improved staggered quarks such as asqtad and HISQ require both first and third nearest neighbor terms in the Dirac operator. We call the corresponding links fatlinks and longlinks. The fatlinks are not unitary, and staggered […]
Feb, 4

Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors

Execution of applications on upcoming high-performance computing (HPC) systems introduces a variety of new challenges and amplifies many existing ones. These systems will be composed of a large number of ldquofatrdquo nodes, where each node consists of multiple processors on a chip with symmetric multithreading capabilities, interconnected via high-performance networks. Traditional system software for parallel […]
Feb, 4

QP: A Heterogeneous Multi-Accelerator Cluster

We present a heterogeneous multi-accelerator cluster developed and deployed at NCSA. The cluster consists of 16 AMD dual-core CPU compute nodes each with four NVIDIA GPUs and one Xilinx FPGA. Cluster nodes are interconnected with both InfiniBand and Ethernet networks. The software stack consists of standard cluster tools with the addition of accelerator-specific software packages […]
Feb, 3

On testing GPU memory for hard and soft errors

NVIDIA GPUs are becoming increasingly popular in scientific computation as a way to accelerate the execution of computationally demanding codes. The graphics memory used in GPUs is not protected against soft errors that may be caused by cosmic radiation and thus is a source of concern for the scientific computing community. In this short paper […]
Feb, 3

Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters

We present an inexpensive hardware system for monitoring power usage of individual CPU hosts and externally attached GPUs in HPC clusters and the software stack for integrating the power usage data streamed in real-time by the power monitoring hardware with the cluster management software tools. We introduce a measure for quantifying the overall improvement in […]
Feb, 3

MILC on GPUs

The MIMD Lattice Computation (MILC) code, a Quantum Chromodynamics (QCD) application used to simulate four-dimensional SU(3) lattice gauge theory, is one of the largest compute cycle users at many supercomputing centers. Previously we have investigated how one of MILC applications can be accelerated on the Cell Broadband Engine. We currently investigate how this code can […]
Feb, 3

3I: A tool for visualizing and processing in parallel 2D & 3D images

We present a tool for intensive processing of digital images based on graphics processing units (GPUs) and multi-core CPU. The tool incorporates innovative filters for the denoising and estimation of missing information in three-dimensional digital images. Both processes are integrated into a pipeline that repeatedly evaluates the image until a given convergence. Finally, 3D images […]
Feb, 3

3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation

Medical image registration is time-consuming but can be sped up employing parallel processing on the GPU. Normalized mutual information (NMI) is a well performing similarity measure for performing multi-modal registration. We present CUDA based solutions for computing NMI on the GPU and compare the results obtained by rigidly registering multi-modal data sets with a CPU […]
Feb, 3

3D Information Extraction Based on GPU

Our project starts from a practical specific application of stereo vision (matching) on a robot arm, which is first building up a vision system for a robot arm to make it obtain the capability of detecting the objects 3D information on a plane. The kernel of the vision system is stereo matching. Stereo matching(correspondence) problem […]
Feb, 3

3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis

Graphics Processing Units (GPUs) offer tremendous computational and processing power. The architecture requires high communication bandwidth and lower latency between computation units and caches. 3D die-stacking technology is a promising approach to meet such requirements. To the best of our knowledge no other study has investigated the implementation of 3D technology in GPUs. In this […]
Feb, 3

3D finite element numerical integration on GPUs

The algorithmic and computational aspects of 3D finite element numerical integration on GPUs are investigated in the paper. The special stress is put on selecting the proper parallelization strategies depending upon the properties of FEM problems solved and approximations used. The close interplay between the available computational resources of GPUs and the possible implementation strategies […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: