Posts
Mar, 3
Low-Energy Application Parallelism 2013, LEAP 2013
LEAP 2013 is the place to learn about and share the latest advances in the use of high-performance parallel computing technology on low-power mobile CPU, GPU, FPGA and embedded processors. Two days of world-class education and networking will give developers, researchers, engineers and technology managers the vital knowledge they need to understand, assess and exploit […]
Mar, 2
OpenOF: Framework for Sparse Non-linear Least Squares Optimization on a GPU
In the area of computer vision and robotics non-linear optimization methods have become an important tool. For instance, all structure from motion approaches apply optimizations such as bundle adjustment (BA). Most often, the structure of the problem is sparse regarding the functional relations of parameters and measurements. The sparsity of the system has to be […]
Mar, 2
Accelerating Kernel Density Estimation on the GPU Using the CUDA Framework
The main problem of the kernel density estimation methods is the huge computational requirements, especially for large data sets. One way for accelerating these methods is to use the parallel processing. Recent advances in parallel processing have focused on the use Graphics Processing Units (GPUs) using Compute Unified Device Architecture (CUDA) programming model. In this […]
Mar, 2
Efficient Detection of Sunspots with GPU Acceleration Through CUDA
Tracking sunspots is not an easy task given that multiple sources of data are acquired using a variety of different instruments. With the sources of data and contributors to this repositories quickly growing, it is increasingly important to have an efficient solution to analyze the photographs to record trends and possibly make predictions. CUDA (Compute […]
Mar, 2
Full Covariance Gaussian Mixture Models Evaluation on GPU
Gaussian mixture models (GMMs) are often used in various data processing and classification tasks to model a continuous probability density in a multi-dimensional space. In cases, where the dimension of the feature space is relatively high (e.g. in the automatic speech recognition (ASR)), GMM with a higher number of Gaussians with diagonal covariances (DC) instead […]
Mar, 2
On Performance of GPU and DSP Architectures for Computationally Intensive Applications
This thesis focuses on the implementations of a support vector machine (SVM) algorithm on digital signal processor (DSP), graphics processor unit (GPU), and a common Intel i7 core architecture. The purpose of this work is to identify which of the three is most suitable for SVM implementation. The performance is measured by looking at the […]
Mar, 2
Large-scale ferrofluid simulations on graphics processing units
We present an approach to molecular-dynamics simulations of ferrofluids on graphics processing units (GPUs). Our numerical scheme is based on a GPU-oriented modification of the Barnes-Hut (BH) algorithm designed to increase the parallelism of computations. For an ensemble consisting of a million ferromagnetic particles, the performance of the proposed algorithm on a Tesla M2050 GPU […]
Mar, 2
Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures
Coarse grain (CG) molecular models have been proposed to simulate complex systems with lower computational overheads and longer timescales with respect to atomistic level models. However, their acceleration on parallel architectures such as graphic processing units (GPUs) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact […]
Mar, 2
On continuous maximum flow image segmentation algorithm
In recent years, with the advance of computing equipment and image acquisition techniques, the sizes, dimensions and content of acquired images have increased considerably. Unfortunately as time passes there is a steadily increasing gap between the classical and parallel programming paradigms and their actual performance on modern computer hardware. In this thesis we consider in […]
Mar, 2
Parallel Peeling Algorithms
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core. In this paper, we analyze parallel peeling processes, where […]
Mar, 2
Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs
Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in "flat" three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction models, and equations of a very […]
Feb, 28
Automatic Mapping of Stream Programs on Multicore Architectures
Stream languages explicitly describe fork-join and pipeline parallelism, offering a powerful programming model for general multicore systems. This parallelism description can be exploited on hybrid architectures, eg. composed of Graphics Processing Units (GPUs) and general purpose multicore processors. In this paper, we present a novel approach to optimize stream programs for hybrid architectures composed of […]