Posts
Oct, 9
UPC on MIC: Early Experiences with Native and Symmetric Modes
Intel Many Integrated Core (MIC) architecture is steadily being adopted in clusters owing to its high compute throughput and power efficiency. The current generation MIC coprocessor, Xeon Phi, provides a highly multi-threaded environment with support for multiple programming models. While regular programming models such as MPI/OpenMP have started utilizing systems with MIC coprocessors, it is […]
Oct, 9
Performance Analysis of a Large Memory Application on Multiple Architectures
The Graph500 Breadth-First Search benchmark has emerged as a well-documented PGAS-style application that both scales to large data set sizes and has documented implementations on multiple platforms over multiple years. This paper analyzes the reported performance and extracts insight into what are the leading performance limitations in such systems and how they scale with system […]
Oct, 9
High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry
In this paper we describe a numerical algorithm for integrating the multicomponent, reacting, compressible Navier-Stokes equations, targeted for direct numerical simulation of combustion phenomena. The algorithm addresses two shortcomings of previous methods. First, it incorporates an eighth-order narrow stencil approximation of diffusive terms that reduces the communication compared to existing methods and removes the need […]
Oct, 8
Enabling the use of Heterogeneous Computing for Bioinformatics
The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise […]
Oct, 8
Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters
Low-Density Parity-Check (LDPC) codes are known for having excellent Bit Error Rate (BER) performance, even in the presence of quite low Signal-to-Noise Ratios (SNR). But the development of this type of error-correcting codes poses severe challenges since the design of new codes is based on heuristics such as girth and sparsity that not always provide […]
Oct, 8
Parallel and Distributed Implementations of Multiple and Two-Dimensional Pattern Matching Algorithms
String matching is a fundamental problem in the area of scientific computing. When two different one-dimensional strings are taken as an input, the so called "input string" and the so called "pattern", the string matching problem involves the location of all the positions in the input string where the pattern appears. As there has been […]
Oct, 8
libcloudph++ 0.1: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++
This paper introduces a library of algorithms for representing cloud microphysics in numerical models written in C++, hence the name libcloudph++. In the initial release, the library covers three warm-rain schemes: the single- and double-moment bulk schemes, and the particle-based scheme with Monte-Carlo coalescence. The three schemes are intended for modelling frameworks of different dimensionality […]
Oct, 8
Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX
We have developed GPU versions for two major high-performance-computing (HPC) applications originating from two different scientific domains. GENE is a plasma microturbulence code which is employed for simulations of nuclear fusion plasmas. VERTEX is a neutrino-radiation hydrodynamics code for "first principles"-simulations of core-collapse supernova explosions. The codes are considered state of the art in their […]
Oct, 7
Advanced 2D Rasterization on Modern CPUs
The graphics processing unit (GPU) has become part of our everyday life through desktop computers and portable devices (tablets, mobile phones, etc.). Because of the dedicated hardware visualization has been significantly accelerated and today’s software uses only the GPU for rasterization. Besides the graphical devices, the central processing unit (CPU) has also made remarkable progress. […]
Oct, 7
Performance evaluation of CUDA programming for machining simulation
5-axis milling simulations in CAM software are mainly used to detect collisions between the tool and the part. They are very limited in terms of surface topography investigations to validate machining strategies as well as machining parameters such as chordal deviation, scallop height and tool feed. Z-buffer or N-Buffer machining simulations provide more precise simulations […]
Oct, 7
GPU Accelerated Conjunction Assessment with Applications to Formation Flight and Space Debris Tracking
The primary purpose of conjunction assessment (CA) is to prevent the collision of objects in space. Typical collision scenarios involve satellites with space debris or a formation of satellites with each other. Users performing orbit propagation and CA on very large scales must judiciously moderate force model fidelity and/or acutely limit the number of objects […]
Oct, 7
Vectorized OpenCL implementation of numerical integration for higher order finite elements
In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the […]