Posts
Jan, 23
Multi-GPU parallel memetic algorithm for capacitated vehicle routing problem
The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider simple variation of vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We present […]
Jan, 19
A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster
A simulator for microfluidic systems, based on lattice Boltzmann method (LBM) was developed for running on a Graphics Processing Unit (GPU) cluster. It was written on CUDA C language, implementing single component single phase fluids, and includes periodic, velocity, bounce-back and pressure boundary conditions. The program was run on a cluster with four node, where […]
Jan, 19
GPU based Implementation of Film Flicker Reduction Algorithms
In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.
Jan, 19
FlowTour: An Automatic Guide for Exploring Internal Flow Features
We present FlowTour, a novel framework that provides an automatic guide for exploring internal flow features. Our algorithm first identifies critical regions and extracts their skeletons for feature characterization and streamline placement. We then create candidate viewpoints based on the construction of a simplified mesh enclosing each critical region and select best viewpoints based on […]
Jan, 19
Finite-difference time-domain solver for room acoustics using graphics processing units
Several acoustic simulation methods have been introduced during the past decades. Wave-based simulation methods have been one of the alternatives, but their applicability for wideband acoustic simulation has been limited by the computing power of available hardware. During recent years, the processing power and programmability of graphics processing units have improved, and therefore several wave-based […]
Jan, 19
GPU Computing for Meshfree Particle Method
Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. A study on the comparison of computational speed-up and efficiency of a GPU with a CPU for the Finite Pointset Method (FPM), which is a numerical tool in Computational Fluid Dynamics (CFD) is presented. As FPM is based on […]
Jan, 18
High-performance and Embedded Systems for Cryptography
This thesis addresses the design of cryptographic accelerators, ranging from the embedded system to the high-performance computing device. New techniques are proposed to allow several cryptographic algorithms to be computed by the same target. Therefore, flexibility (to support several algorithms) and scalability (to extend the features of a designed accelerator) are two keywords in all […]
Jan, 18
Supporting x86-64 Address Translation for 100s of GPU Lanes
Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access […]
Jan, 18
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming […]
Jan, 18
A GPU-based Multi-level Subspace Decomposition Scheme for Hierarchical Tensor Product Bases
The aim of this thesis is to implement a multi-level splitting of full grids on the GPU, which could be used in the incremental visualization of scientific data sets. The splitting is motivated by the approximation properties of the sparse grid technique. Looking towards large amounts of data, ideas of parallelization and data slicing are […]
Jan, 18
Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly
This paper focuses on an important query in scientific simulation data analysis: the Spatial Distance Histogram (SDH). The computation time of an SDH query using brute force method is quadratic. Often, such queries are executed continuously over certain time periods, increasing the computation time. We propose highly efficient approximate algorithm to compute SDH over consecutive […]
Jan, 17
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
We examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm–the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite […]