Posts
Jan, 24
Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems
We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU […]
Jan, 24
Hybrid Single/Double Precision Floating-Point Computation on GPU Accelerators for 2-D FDTD
Acceleration of FDTD (Finite-Difference TimeDomain) is very important in computational electromagnetic. We propose a hybrid single/double precision floating-point computation to accelerate FDTD on GPUs. We apply single-precision when the dynamic range of the electromagnetic field is low and double-precision when the dynamic range is high. According to the experimental results, we achieved over 35 times […]
Jan, 24
Developing and Evaluating clOpenCL Applications for Heterogeneous Clusters
In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore […]
Jan, 24
Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA
High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been […]
Jan, 24
GPU-based 3D Wavelet Transform
Wide amount of applications like volumetric medical data compression, video watermarking and video coding use the three-dimensional wavelet transform (3D-DWT) in their algorithms. In this work, we present GPU algorithms, based on both global and shared memory, to compute the 3D-DWT transform on both the GTX280 and the GMT540 platforms. The results obtained show that […]
Jan, 23
Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
GPUs are seeing increasingly widespread use for general purpose computation due to their excellent performance for highly-parallel, throughput-oriented applications. For many workloads, however, the performance benefits of offloading are hindered by the large and unpredictable overheads of launching GPU kernels and of transferring data between CPU and GPU. This paper proposes and evaluates hardware and […]
Jan, 23
The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion
This paper introduces a reaction-diffusion-chemotaxis model for bacterial aggregation patterns on the surface of thin agar plates. It is based on the non-linear degenerate cross diffusion model proposed by Kawasaki et al. (J. of Theor. Biol. 188(2) 1997) and it includes a suitable nutrient chemotactic term compatible with such type of diffusion. High resolution numerical […]
Jan, 23
A survey on various computationally intensive parallel applications in High performance Computing System with OpenCL-MPI
As we are in the development phase of our own super computer, we have identified several applications which are highly computationally intensive applications for a normal desktop computer to achieve the solution. These identified applications are related to multidisciplinary like bio-medical, mathematics, fluid dynamics, genetic algorithms. We are actually identifying the parallel computations involved in […]
Jan, 23
Implementing Open-Source CUDA Runtime
Graphics processing units (GPUs) are the state of the art embracing the concept of many-core technology. Their significant advantage in performance and performanceper-watt compared to traditional microprocessors has facilitated development of GPUs in many compute applications. However, GPUs are often treated as "black-box" devices due to proprietary strategies of hardware vendors. One of the greatest […]
Jan, 22
Data parallel patterns on CPU/GPU mix
We propose a model that uses a small set of quite simple parameters to devise a proper partitioning{between CPU and GPU cores{of the tasks deriving from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It eventually computes the percentage of tasks to be executed on CPU […]
Jan, 22
A GPU Accelerated Navier-Stokes Solver with Multi-level Granularity for Solving Sparse Implicit Systems
In recent years, researchers have employed a wide array of multi-physics computational tools, of varying sophistication, to simulate brownout conditions [1-3]. Among these tools, compressible high-fidelity Reynolds-Averaged Navier Stokes (RANS) solvers [3] depend the least on empirical assumptions. However, the high computational expense involved in RANS simulations of viscous, rotary environments, makes it less attractive […]
Jan, 22
Accelerating Fast Fourier Transforms Using Hadoop and CUDA
There has been considerable research into improving Fast Fourier Transform (FFT) performance through parallelization and optimization for specialized hardware. However, even with those advancements, processing of very large files, over 1TB in size, still remains prohibitively slow. Analysts performing signal processing are forced to wait hours or days for results, which results in a disruption […]