Posts
Jun, 22
Resource Sharing in GPU-Accelerated Windowing Systems
Recent windowing systems allow graphics applications to directly access the graphics processing unit (GPU) for fast rendering. However, application tasks that render frames on the GPU contend heavily with the windowing server that also accesses the GPU to blit the rendered frames to the screen. This resource-sharing nature of direct rendering introduces core challenges of […]
Jun, 22
Exploiting SPMD Horizontal Locality
In this paper, we analyze a particular spatial locality case (called horizontal locality) inherent to manycore accelerator architectures employing barrel execution of SPMD kernels, such as GPUs. We then propose an adaptive memory access granularity framework to exploit and enforce the horizontal locality in order to reduce the interferences among accelerator cores memory accesses and […]
Jun, 21
Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients
For the last two years, we have been experimenting with applying compressed sensing parallel imaging for body imaging of pediatric patients. It is a joint-effort by teams from UC Berkeley, Stanford University and GE Healthcare. This paper aims to summarize our experience so far. We describe our acquisition approach: 3D spoiled-gradient-echo with poisson-disc random undersampling […]
Jun, 21
GPU-based acceleration of MPIE/MoM matrix calculation for the analysis of microstrip circuits
In this paper, we present a GPU-based algorithm which accelerates the MoM impedance matrix computation. Based on an efficient quasi-one-dimensional approximation of the reaction integrals, the MPIE formulation for the analysis of microstrip circuits is considered. We use NVIDIA CUDA as GPU development tool and choose an edge-connected line-fed patch antenna as reference problem. In […]
Jun, 21
GPU acceleration of compton reconstruction for the PEDRO
Compton reconstruction requires the computationally intensive, yet highly parallelizable, task of Cone of Response (CoR) back-projection. The acceleration of CoR back-projection is of significant importance as a faster algorithm allows the user to increase either the size or resolution of the imaging volume. Such acceleration also lends itself to the realization of real-time reconstruction. The […]
Jun, 21
Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring
The programmability of recent graphic processing unit (GPU) architectures has been the main factor driving the dramatic increase in interest for this class of architectures as low-cost accelerators for a wide range of high-performance applications. Current GPU programming models, such as OpenCL and CUDA, still expose too many architectural features, such as the memory hierarchy, […]
Jun, 21
GPU-based motion correction of contrast-enhanced liver MRI scans: An OpenCL implementation
Clinical diagnosis and quantification of liver disease have been improved through the development of techniques using contrast-enhanced liver MRI sequences. To qualitatively or quantitatively analyze such image sequences, one first needs to correct for rigid and non-rigid motion of the liver. For motion correction of the liver, we have employed bi-directional local correlation coefficient Demons, […]
Jun, 21
GPU accelerated rotation-based emission tomography reconstruction
Stochastic methods based on Maximum Likelihood Estimation (MLE) provide accurate tomographic reconstruction for emission imaging. Moreover methods based on MLE allow to include an accurate physical model of the imaging setup in the reconstruction process, thus enabling quantitative reconstruction of radio-tracer activity distribution. It has been shown that inclusion of a spatially dependent PSF that […]
Jun, 21
Performance evaluation of the multi-device OpenCL FDTD solver
We present results of an evaluation of a multi-device OpenCL FDTD solver. Portability between hardware manufactured by different vendors and also between highly specialized and parallel computing architectures available on the market, i.e. GPUs, multi-core CPUs and devices integrating both technologies in a single-die IC, is the main advantage of this solver. For code execution […]
Jun, 21
Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth
Stencil computation is one of the important kernels in scientific computations, however, the sustained performance is limited by memory bandwidth especially on multi-core microprocessors and GPGPUs due to its small operationalintensity. In this paper, we propose a scalable streaming-array (SSA) of simple soft-processors for high-performance stencil computation on multiple FPGAs. The SSA architecture allows a […]
Jun, 21
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
BACKGROUND: Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a […]
Jun, 21
Fast, parallel, GPU-based construction of space filling curves and octrees
Space Filling Curves (SFC) are particularly useful in linearization of data living in two and three dimensional spaces and have been used in a number of applications in scientific computing, and visualization. Interestingly, octrees, another versatile data structure in computer graphics, can be viewed as multiple SFCs at varying resolutions, albeit with parent-child relationship. In […]