Posts
Apr, 13
Speeding up K-Means Algorithm by GPUs
Cluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the computational challenge. In this paper, we target at parallelizing k-Means, which is one of the most […]
Apr, 13
Accelerate Cache Simulation with Generic GPU
Trace-driven cache simulation is the most widely used method to evaluate different cache structures. Several techniques have been proposed to reduce the simulation time of sequential trace-driven simulation. An obvious way to achieve fast parallel simulation is to simulate the individual independent sets of a cache concurrently on different compute resources. We propose improvements to […]
Apr, 13
NQueens on CUDA: Optimization Issues
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the […]
Apr, 13
Memory Saving Discrete Fourier Transform on GPUs
This paper will show an alternative method to compute the two-dimensional Discrete Fourier Transform. While current GPU Fourier transform libraries need a large buffer for storing intermediate results, our method can compute the same output with far less memory. This will function by exploiting the separability of the Fourier transform. Using this scheme, it is […]
Apr, 13
A Multi-GPU Spectrometer System for Real-Time Wide Bandwidth Radio Signal Analysis
This paper describes the implementation of a large bandwidth multi-GPU signal processing system for radio astronomy observation. This system performs very large Fast Fourier Transform (FFT) and spectrum analysis to achieve real-time analysis of a large bandwidth spectrum. This is accomplished by implementing a four-step FFT algorithm in Compute Unified Device Architecture (CUDA). The key […]
Apr, 13
GPU-Accelerated KLT Tracking with Monte-Carlo-Based Feature Reselection
Many computer vision methods rely on frame registration information obtained with algorithms such as the Kanade-Lucas-Tomasi (KLT) feature tracker, which is known for its excellent performance in that area. Various research groups proposed methods to extend its performance, both in terms of execution time and stability. Recent research has shown that current graphics processing units […]
Apr, 13
Efficient Collision Detection and Physics-Based Deformation for Haptic Simulation with Local Spherical Hash
While real time computer graphics rely on a frame rate of 30 iterations per second to fool the eye and render smooth motion transitions, computer haptics deals with the sense of touch, which requires a higher rate of around 1kHz to avoid discontinuities. The use of haptics on interactive applications as surgical simulations or games, […]
Apr, 13
Community Structure Discovery algorithm on GPU with CUDA
The automatic search and community discovery in large and complex network has important practical application. It is difficult to be tradeoff in computing speed and clustering exactness. To improve clustering exactness have to decrease the time complexity. In this paper a novel single instruction Multiple Data architecture processors based on Newman algorithm is proposed. The […]
Apr, 13
Efficient parallelized particle filter design on CUDA
Particle filtering is widely used in numerous nonlinear applications which require reconfigurability, fast prototyping, and online parallel signal processing. The emerging computing platform, CUDA, may be regarded as the most appealing platform for such implementation. However, there are not yet literatures exploring how to utilize CUDA for particle filters. This parer aims to provide two […]
Apr, 12
Implementation and optimization of image processing algorithms on handheld GPU
The advent of GPUs with programmable shaders on handheld devices has motivated embedded application developers to utilize GPU to offload computationally intensive tasks and relieve the burden from embedded CPU. In this work, we propose an image processing toolkit on handheld GPU with programmable shaders using OpenGL ES 2.0 API. By using the image processing […]
Apr, 12
Power analysis and optimizations for GPU architecture using a power simulator
As one of the most popular many-core architecture, GPUs have illustrated power in many non-graphic applications. Traditional general purpose computing systems tend to integrate GPU as the co-processor to accelerate parallel computing tasks. Meanwhile, GPUs also result in high power consumption, which accounts for a large proportion of the total system power consumption. In this […]
Apr, 12
CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator
Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Some of the most notable ones are isolating the part of the algorithm that can be optimized to run on the GPU; tuning the program […]