Posts
Dec, 23
Real time mitigation of atmospheric turbulence in long distance imaging using the lucky region fusion algorithm with FPGA and GPU hardware acceleration
"Lucky-region" fusion (LRF) is a synthetic imaging technique that has proven successful in enhancing the quality of images distorted by atmospheric turbulence. The LRF algorithm selects sharp regions of an image obtained from a series of short exposure frames, and fuses the sharp regions into a final, improved image. In previous research, the LRF algorithm […]
Dec, 23
SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs
Approximate computing provides an opportunity for exploiting application characteristics to improve performance of computing systems. However, such opportunity must be balanced against generality of methods and quality guarantees that the system designer can provide to the application developer. Improved parallel processing in graphics processing units (GPUs) provides one such means for data-level parallel applications. We […]
Dec, 22
OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures
The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]
Dec, 22
GPU-accelerated Bernstein-Bezier discontinuous Galerkin methods for wave problems
We evaluate the computational performance of the Bernstein-Bezier basis for discontinuous Galerkin (DG) discretizations and show how to exploit properties of derivative and lift operators specific to Bernstein polynomials. Issues of efficiency and numerical stability are discussed in the context of a model wave propagation problem. We compare the performance of Bernstein-Bezier kernels to both […]
Dec, 22
Parallel FDTD Arithmetic Simulation Based on Distributed Heterogeneous Cluster System
This paper puts forward a new FDTD parallel algorithm, which is developed based on the distributed platform, the algorithm was debugged in Shanghai Jiao-tong University for the high performance computing center GPU cluster, "Rubik’s Cube" commercial super computer at Shanghai Supercomputer Center and "divinity blue" domestic super computer platform at the National Supercomputing Center in […]
Dec, 22
Performance and Productivity of Parallel Python Programming: A study with a CFD Test Case
The programming language Python is widely used to create rapidly compact software. However, compared to low-level programming languages like C or Fortran low performance is preventing its use for HPC applications. Efficient parallel programming of multi-core systems and graphic cards is generally a complex task. Python with add-ons might provide a simple approach to program […]
Dec, 22
A time-energy performance analysis of MapReduce on heterogeneous systems with GPUs
Motivated by the explosion of Big Data analytics, performance improvements in lowpower (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a time-energy performance analysis of MapReduce on heterogeneous systems with GPUs. We evaluate the time and energy performance of three MapReduce applications with diverse resource demands on a Hadoop-CUDA framework. As […]
Dec, 19
Autotuning Stencils Codes with Algorithmic Skeletons
The physical limitations of microprocessor design have forced the industry towards increasingly heterogeneous architectures to extract performance. This trend has not been matched with software tools to cope with such parallelism, leading to a growing disparity between the levels of available performance and the ability for application developers to exploit it. Algorithmic skeletons simplify parallel […]
Dec, 19
Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes
The main topic of the present thesis is the improvement of fabrication processes simulation by means of the Level Set (LS) method. The LS is a mathematical approach used for evolving fronts according to a motion defined by certain laws. The main advantage of this method is that the front is embedded inside a higher […]
Dec, 19
Investigation of the SYCL for OpenCL Programming Model
OpenCL and SYCL for OpenCL are open-standard programming models which enable development of parallel programs which target heterogeneous hardware: systems which contain both general-purpose CPUs and accelerator devices such as GPGPUs or Intel Xeon Phi cards. While OpenCL provides a C API, SYCL provides a C++ API and allows programmers to take advantage of many […]
Dec, 19
Challenges Adapting CUDA PIC Codes to multiple GPUs
A Particle-In-Cell code is a common particle simulation method often used to simulate the behaviour of plasma. In this work, a parallel PIC code is developed in CUDA, with a focus on how to adapt the method for multiple GPUs. An electrostatic three dimensional PIC code is developed, with an FFT-based solver using the cuFFT […]
Dec, 19
Efficient Query Processing in Co-Processor-accelerated Databases
Advancements in hardware changed the bottleneck of modern database systems from disk IO to main memory access and processing power. Since the performance of modern processors is primarily limited by a fixed energy budget, hardware vendors are forced to specialize processors. Consequently, processors become increasingly heterogeneous, which already became commodity in the form of accelerated […]