Posts
Jul, 26
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters. We show that the original OpenCL semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming. The target cluster architecture consists of a designated, single host node and many compute nodes. They are […]
Jul, 26
Efficient Implementation of the CPR Formulation for the Navier-Stokes Equations on GPUs
The correction procedure via reconstruction (CPR) formulation for the Euler and Navier-Stokes equations is implemented on a NVIDIA graphics processing unit (GPU) using CUDA C with both explicit and implicit time-stepping schemes for 2D unstructured triangular grids. For the implicit time integration, a first order time approximation with Newton iteration and Gauy elimination is used […]
Jul, 26
Bidimensional Median Filter for Parallel Computing Architectures
The median filter is a non-linear filter used for removal of salt and pepper noise from images. Each pixel of the image is replaced by the median of its surrounding elements, the median value is calculated by sorting the data. The complexity of the sorting algorithms used on the median filters are O(n^2) or O(n), […]
Jul, 26
Parallelization of Data Intensive Code Using Computer Unified Device Architecture (CUDA)
Parallel processing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. Parallelism has been employed for many years, mainly in high-performance computing. As power consumption by Computer has become a concern in […]
Jul, 26
Homunculus Warping: Conveying importance using self-intersection-free non-homogeneous mesh deformation
Size matters. Human perception most naturally relates relative extent, area or volume to importance, nearness and weight. Reversely, conveying importance of something by depicting it at a different size is a classic artistic principle, in particular when importance varies across a domain. One striking example is the neuronal homunculus; a human figure where the size […]
Jul, 25
Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code
The Multi-threaded Adaptive Optics Simulator (MAOS) was developed at TMT to efficiently simulate various kind of AO systems. In particular, it can finish a time step of full end-to-end simulation of an ELT size multi-conjugate AO system in 1 second on 8 contemporary cpu cores. We recently ported it to run on graphical processing units […]
Jul, 25
Accelerating Noninvasive Transmural Electrophysiological Imaging with CUDA
The human heart is a vital muscle of the body. Abnormalities in the heart can disrupt its normal operation. One such abnormality that affects the middle layer of the heart wall (myocardium) is called myocardial scars. Just like any tissue in the body, damage to healthy tissue will trigger scar tissue to form. Normally this […]
Jul, 25
Remote GPU-Accelerated Online Pre-processing of Raster Maps for Terrain Rendering
We present a distributed architecture for accelerated pre-processing of remote sensing data for immediate terrain visualization. Interactive 3D visualization approaches for large terrain datasets employ level of detail techniques that require a multi-resolution data representation. The high computational cost of constructing these representations is often not viewed as a major drawback, as it is considered […]
Jul, 25
Optimising Cosmological N-body Simulations in GPU Clusters
Cosmological simulations play an important role in understanding the evolution of our universe. Since the experiments on the formation of galaxies cannot be performed in laboratory, simulation is the only way to understand this phenomenon. The cosmological simulations are usually modelled as N-body problems. The Barnes-Hut (BH) tree code algorithm is one of the popular […]
Jul, 25
Ice Simulation Using GPGPU
Simulation of the behaviour of a ship operating in pack ice is a computationally intensive process to which General Purpose Computing on Graphical Processing Units (GPGPU) can be applied. In this paper we present an efficient parallel implementation of such a simulator developed using the NVIDIA Compute Unified Device Architecture (CUDA). We have conducted an […]
Jul, 24
Source-to-source transformations for irregular and multithreaded code optimization
Source-to-Source optimization is an efficient method to generate, from a basic implementation, a high performance program for the two main challenges that are irregular codes and heterogeneous implementation. In the last decade, general purpose CPUs moved towards multi-core architectures, and the end of the increase in processors frequency marked a turning point obtaining the best […]
Jul, 24
Evaluation of state-of-the-art polyhedral tools for automatic code generation on GPUs
At present, multi-core and manycore platforms lead the computer industry, forcing software developers to adopt new programming paradigms, in order to fully exploit their computing capabilities. Nowadays, Graphics Processing Units (GPUs) are one of representatives of many-core architectures, and certainly the most widespread. This paper evaluates and compares tool frameworks that automatically generate code for […]