Posts
Nov, 6
Advanced MRI reconstruction toolbox with accelerating on GPU
In this paper, we present a fast iterative magnetic resonance imaging (MRI) reconstruction algorithm taking advantage of the prevailing GPGPU programming paradigm. In clinical environment, MRI reconstruction is usually performed via fast Fourier transform (FFT). However, imaging artifacts (i.e. signal loss) resulting from susceptibility-induced magnetic field inhomogeneities degrade the quality of reconstructed images. These artifacts […]
Nov, 6
Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units
In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal […]
Nov, 6
Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units
Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been […]
Nov, 6
Running unstructured grid-based CFD solvers on modern graphics hardware
Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three-dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per-block shared […]
Nov, 5
Fast QAP Solver with ACO and Taboo Search on GPU using Move-Cost Adjusted Thread Assignment
There are several studies on solving the quadratic assignment problem (QAP) withGPUs using an evolutionary computation. In our previous studies [3], we applied GPU computation to solve quadratic assignment problems (QAPs) using a distributed parallel GA model on GPUs. However, in those studies no local searches were applied. In this QAP solver, we implemented a […]
Nov, 5
Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units
GPU-accelerated computing of the Boltzmann collision integral is studied using deterministic method with piecewise approximation of the velocity distribution function and analytical integration over collision impact parameters. The acceleration of 40 times is achieved compared to CPU calculations for a 3D problem of collisional relaxation of bi-Maxwellian velocity distribution.
Nov, 5
Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support
We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, […]
Nov, 5
Challenges for compiler support for exascale computing
The compiler is central to the translation of the software we want users to write to the machine code we want to run. The scale of the applications and the choices of programming languages by users greatly complicate the role for the compiler and its analysis. The languages we use frequently don’t support rich optimizations […]
Nov, 5
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking
Diffusion Tensor Imaging (DTI) allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU). This algorithm, […]
Nov, 5
Implementation of a multigrid solver on GPU for Stokes equations with strongly variable viscosity based on Matlab and CUDA
Stokes equations have been used in numerical simulations of geodynamic processes such as mantle convection , lithospheric deformation and lava flow, etc. In order to implement a solver for these equations, multigrid method is introduced to our solve. Multigrid method is commonly used in reducing the iteration steps for solving the elliptic partial differential equation […]
Nov, 5
A mobile robot navigation with use of CUDA parallel architecture
In this article we present a navigation system of a mobile robot based on parallel calculations. It is assumed that the robot is equipped with a 3D laser range scanner. The system is essentially based on a dual grid-object, where labels are attached to detected objects (such maps can be used in navigation based on […]
Nov, 5
Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices
CUDASW++ is a parallelization of the Smith-Waterman algorithm for CUDA graphical processing units that computes the similarity scores of a query sequence paired with each sequence in a database. The algorithm uses one of two kernel functions to compute the score between a given pair of sequences: the inter-task kernel or the intra-task kernel. We […]