Posts
Feb, 2
Software Reliability Enhancements for GPU Applications
As the role of highly-parallel accelerators becomes more important in high performance computing, so does the need to ensure their reliable operation. In applications where precision and correctness is a necessity, bit-level reliable operation is required. While there exist mechanisms for error detection and correction, the cost-effective implementation in massively parallel accelerators is still an […]
Feb, 2
Portable Performance on Heterogeneous Architectures
Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of the greatest computational resources is now their graphics coprocessors (GPUs), not just their primary CPUs. But GPU programming and memory models differ dramatically from […]
Feb, 2
Heterogeneous GPU and CPU acceleration of a finite volume compressible flow solver for multiblock structured grids
The main objective of this project is to investigate the applications of heterogeneous acceleration to finite volume compressible flow solver for multiblock structured grids. Provided as Fortran source code, the ROTORMBMGS computational fluid dynamics program currently uses domain decomposition and message passing to distribute computation across multiple computers. Winning awards for scaling performance, there is […]
Feb, 2
Improving GPGPU Concurrency with Elastic Kernels
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 […]
Feb, 2
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes; scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports […]
Feb, 1
Embedding OpenCL in GHC Haskell
OpenCL defines a computation model for data-parallel code, supporting compilation to a variety of platforms, including both conventional x86 CPUs and commodity graphics hardware. OpenCL consists of both a programming language for writing data parallel code, called kernels, and an API, written in C, for interacting with the OpenCL platform and invoking OpenCL kernels. We […]
Feb, 1
Efficient Exploitation of Heterogeneous Platforms for Vertebra Detection in X-Ray Images
Back problems are often related to an abnormal condition of the spine. In this context, conventional X-Ray radiography is the most common modality used in emergency rooms since it is relatively inexpensive and fast. In this paper, we are interested in a method for detecting and extracting vertebrae on X-Ray images. In a medical context, […]
Jan, 31
Validation of the PyGBe code for Poisson-Boltzmann equation with boundary element methods
The PyGBe code solves the linearized Poisson-Boltzmann equation using a boundary-integral formulation. We use a boundary element method with a collocation approach, and solve it via a Krylov-subspace method. To do this efficiently, the matrix-vector multiplications in the Krylov iterations are accelerated with a treecode, achieving O(N log N) complexity. The code presents a Python […]
Jan, 31
Studies Concerning the ATLAS IBL Calibration Architecture
With the commissioning of the Insertable B-Layer (IBL) in 2013 at the ATLAS experiment 12~million additional pixels will be added to the current Pixel Detector. While the idea of employing pairs of VME based Read-Out Driver (ROD) and Back of Crate (BOC) cards in the read-out chain remains unchanged, modifications regarding the IBL calibration procedure […]
Jan, 31
OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this […]
Jan, 31
Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics
As part of a 3-wk intersession workshop funded by a National Science Foundation Expeditions in Computing award, 15 undergraduate students from the City University of New York1 collaborated on a study aimed at characterizing the voltage dynamics and arrhythmogenic behavior of cardiac cells for a broad range of physiologically relevant conditions using an in silico […]
Jan, 31
The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics
Dislocations play an important role in the deformation behaviors of metals. They not only interact via long-range elastic stress, but also interact with shortrange interactions; they annihilate, tangle, get stuck, and unstuck. These interaction between dislocations lead to interesting dislocation wall formation at the mesoscales. A recently developed continuum dislocation dynamics model that shows dislocation […]