Posts
Apr, 8
Shape Manipulation on GPU
This paper proposes a novel hardware-accelerating deformation algorithm based on curve-skeleton model for 2D shape manipulation. The deformation algorithm can achieve real-time interactive shape manipulation without any pre-computing step. The deforming regions of shapes are demarcated with a simple skeleton frame and are simulated by a curve-skeleton model consisting of triangle-strips. The algorithm obtains two […]
Apr, 7
High throughput multiple-precision GCD on the CUDA architecture
Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core […]
Apr, 7
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity which make it a […]
Apr, 7
Scalability of Higher-Order Discontinuous Galerkin FEM Computations for Solving Electromagnetic Wave Propagation Problems on GPU Clusters
A highly parallel implementation of Maxwell’s equations in the time domain using a cluster of Graphics Processing Units (GPUs) is presented. The higher-order Discontinuous Galerkin Finite Element Method (DG-FEM) is used for spatial discretization since its characteristics are matching the parallelization design aspects of the NVIDIA Compute Unified Device Architecture (CUDA) programming model. Asynchronous data […]
Apr, 7
Using GPU to Accelerate Cache Simulation
Caches play a major role in the performance of high speed computer systems. Trace driven simulator is the most widely used method to evaluate cache architectures. However, as the cache design moves to more complicated architectures, along with the size of the trace is becoming larger and larger. Traditional simulation methods are no longer practical […]
Apr, 7
Accelerating Phase Correlation Functions Using GPU and FPGA
In this paper, we present a comparison study about implementations of phase correlation function using GPUs, ASIC and FPGAs. The Phase Only Correlation(POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc. We have […]
Apr, 7
A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU
Simulation and visualization of emergent crowd in real-time is a computationally intensive task. This intensity mostly comes from the O(n2) complexity of the traversal algorithm, necessary for the proximity queries of all pair of entities in order to compute the relevant mutual interactions. Previous works reduced this complexity by considerably factors, using adequate data structures […]
Apr, 7
Context-aware volume navigation
The trackball metaphor is exploited in many applications where volumetric data needs to be explored. Although it provides an intuitive way to inspect the overall structure of objects of interest, an in-detail inspection can be tedious – or when cavities occur even impossible. Therefore we propose a context-aware navigation technique for the exploration of volumetric […]
Apr, 7
Practical examples of GPU computing optimization principles
In this paper, we provide examples to optimize signal processing or visual computing algorithms written for SIMT-based GPU architectures. These implementations demonstrate the optimizations for CUDA or its successors OpenCL and DirectCompute. We discuss the effect and optimization principles of memory coalescing, bandwidth reduction, processor occupancy, bank conflict reduction, local memory elimination and instruction optimization. […]
Apr, 7
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact […]
Apr, 7
Scaling Hierarchical N-body Simulations on GPU Clusters
This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation […]
Apr, 6
Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs
To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich […]