Posts
Dec, 3
OpenCL Based High-Quality HEVC Motion Estimation on GPU
This paper presents a high quality H.265/HEVC motion estimation implementation with the cooperation of CPU and GPU. The data dependency from MVP (Motion Vector Predictor) restricts the degree of parallelism on GPU. To overcome the constraint from MVP, we propose to use an estimated MVP on GPU and the accurate MVP to refine the motion […]
Dec, 3
Implementation of k-Means Clustering Algorithm in CUDA
Big Data poses a very great computational challenge for programmers as well as machines as a lot of number crunching is to be done.Due to recent development in the shared memory inexpensive architecture like Graphics Processing Units (GPU), an alternative has emerged. In this paper, we target at decreasing runtime for k-Means, which is one […]
Dec, 3
Numerical cosmology on the GPU with Enzo and Ramses
A number of scientific numerical codes can currently exploit GPUs with remarkable performance. In astrophysics, Enzo and Ramses are prime examples of such applications. The two codes have been ported to GPUs adopting different strategies and programming models, Enzo adopting CUDA and Ramses using OpenACC. We describe here the different solutions used for the GPU […]
Dec, 3
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar […]
Dec, 3
Parallelization of a novel frequent itemset hiding algorithm on a CPU-GPU platform
Data mining is used to extract useful information from large data. But the organizations which mine the data might not be the owner of the data. So, before the owners can make their data accessible for data mining they want to make sure that no sensitive information can be mined from the released data whose […]
Dec, 2
Real-Time Hair Rendering
An approach is represented to render hair in real-time by using a small number of guide strands to generate interpolated hairs on the graphics processing unit (GPU). Hair interpolation methods are based on a single guide strand or on multiple guide strands. Each hair strand is composed by segments, which can be further subdivided to […]
Dec, 2
SiftCU: An Accelerated Cuda Based Implementation of SIFT
Scale Invariant Feature Transform (SIFT) is a popular image feature extraction algorithm. SIFT’s features are invariant to many image related variables including scale and change in viewpoint. Despite its broad capabilities, it is computationally expensive. This characteristic makes it hard for researchers to use SIFT in their works especially in real time application. This is […]
Dec, 2
An Approach for Maximizing Performance on Heterogeneous Clusters of CPU and GPU
Over the past years there has been significant enthusiasm for development of parallel computing on Graphics Processing Units (GPU) which have now become powerful and affordable hardware equipping data centers and research clusters. Our earlier research has explored the ways to exploit the parallel compute performance of the GPU along the CPU in the same […]
Dec, 2
Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)
Simulation of spiking neural networks has been traditionally done on high-performance supercomputers or large-scale clusters. Utilizing the parallel nature of neural network computation algorithms, GeNN (GPU Enhanced Neural Network) provides a simulation environment that performs on General Purpose NVIDIA GPUs with a code generation based approach. GeNN allows the users to design and simulate neural […]
Dec, 2
GPU accelerated feature algorithms for mobile devices
Mobile devices offer many new avenues for computer vision and in particular mobile augmented reality applications that have not been feasible with desktop computers. The motivation for this research is to improve mobile augmented reality applications so that natural features, instead of fiducial markers or pure location knowledge, can be used as anchor points for […]
Dec, 1
An Open-Source GPU-Accelerated Feature Extraction Tool
An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the […]
Dec, 1
GPU Declarative Framework
This dissertation presents our novel declarative framework, called the Declarative Framework for GPUs (DEFG). GPUs are highly sophisticated computing devices, capable of computing at very high speeds. The framework makes the development of OpenCL-based GPU applications less complex, and less time consuming. The framework’s approach is two-fold. First, we developed the DEFG domain-specific language in […]