Posts
Apr, 14
GPU acceleration of method of moments matrix assembly using Rao-Wilton-Glisson basis functions
In this paper, a GPU accelerated implementation of the matrix assembly phase of the methods of moments is presented. The modelling of PEC structures using the electric field integral equation and the Rao-Wilton-Glisson basis functions introduced in is considered. NVIDIA CUDA is used to do the GPU development and the double precision support offered by […]
Apr, 14
Accelerating spatial clustering detection of epidemic disease with graphics processing unit
The statistics of disease clustering is of interest to epidemiologists. In order to detect spatial clustering of disease in all the regions of China, we adopted a likelihood ratio based method which utilizes Monte Carlo simulation and spatial exploring to analyze the real time updating data stored in database. However, large number of random tests […]
Apr, 14
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPU’s device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity […]
Apr, 14
Efficient design and implementation of visual computing algorithms on the GPU
In this paper, we explore the key factors in the design and implementation of visual computing (image processing and computer vision) algorithms on the massive parallel GPU (graphics processing units). The goal of the exploration is to provide common perspective and guidelines of using GPU for visual computing applications. We have selected three nontrivial applications […]
Apr, 14
Implicit Feature-Based Alignment System for Radiotherapy
In this paper we present a robust alignment algorithm for correcting the effects of out-of-plane rotation to be used for automatic alignment of the Computed Tomography (CT) volumes and the generally low quality fluoroscopic images for radiotherapy applications. Analyzing not only in-plane but also out-of-plane rotation effects on the Dignitary Reconstructed Radiograph (DRR) images, we […]
Apr, 14
OpenCL/OpenGL aproach for studying active Brownian motion
This work presents a methodology for studying active Brownian dynamics on ratchet potentials using interoperating OpenCL and OpenGL frameworks. Programing details along with optimization issues are discussed, followed by a comparison of performance on different devices. Time of visualization using OpenGL sharing buffer with OpenCL has been tested against another technique which, while using OpenGL, […]
Apr, 13
23d International Conference on Parallel Computational Fluid Dynamics 2011, ParCFD 2011
ParCFD is the annual international conference devoted to the discussion of recent developments and applications of parallel computing in the field of CFD and related disciplines. Since establishment of the ParCFD conference series, parallel computers have become the dominant form of large-scale computing. Emergence of multi-core and heterogeneous architectures in parallel computers has created new […]
Apr, 13
Hardware-Efficient Belief Propagation
Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixel-wise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose […]
Apr, 13
Speeding up K-Means Algorithm by GPUs
Cluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the computational challenge. In this paper, we target at parallelizing k-Means, which is one of the most […]
Apr, 13
Accelerate Cache Simulation with Generic GPU
Trace-driven cache simulation is the most widely used method to evaluate different cache structures. Several techniques have been proposed to reduce the simulation time of sequential trace-driven simulation. An obvious way to achieve fast parallel simulation is to simulate the individual independent sets of a cache concurrently on different compute resources. We propose improvements to […]
Apr, 13
NQueens on CUDA: Optimization Issues
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the […]
Apr, 13
Memory Saving Discrete Fourier Transform on GPUs
This paper will show an alternative method to compute the two-dimensional Discrete Fourier Transform. While current GPU Fourier transform libraries need a large buffer for storing intermediate results, our method can compute the same output with far less memory. This will function by exploiting the separability of the Fourier transform. Using this scheme, it is […]