Posts
Jun, 25
GPU Implementation of the Branch and Bound method for knapsack problems
In this paper, we propose an efficient implementation of the branch and bound method for knapsack problems on a CPU-GPU system via CUDA. Branch and bound computations can be carried out either on the CPU or on a GPU according to the size of the branch and bound list. A better management of GPUs memories, […]
Jun, 25
Approximate Principal Direction Trees
We introduce a new spatial data structure for high dimensional data called the emph{approximate principal direction tree} (APD tree) that adapts to the intrinsic dimension of the data. Our algorithm ensures vector-quantization accuracy similar to that of computationally-expensive PCA trees with similar time-complexity to that of lower-accuracy RP trees. APD trees use a small number […]
Jun, 25
An Adaptative Multi-GPU based Branch-and-Bound. A Case Study: the Flow-Shop Scheduling Problem
Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound (B&B) algorithm requires a huge amount of computational resources. Therefore, we recently investigated designing B&B algorithms on top of graphics processing units (GPUs) using a parallel bounding model. The proposed model assumes parallelizing the evaluation of the lower bounds on pools of sub-problems. The results demonstrated […]
Jun, 23
3rd Annual International Conference on Advances in Distributed and Parallel Computing, ADPC 2012
Topics of interest include, but are not limited to: * Parallel Computing * Cluster Computing * Volunteer Computing * Grid and Cloud Computing * Multi-core Architectures and Algorithms * GPU Programming * Web Services and Internet Computing * Cooperative and Collaborative Computing * Peer-to-peer Computing * Mobile and Ubiquitous Computing * New Parallel System Concepts […]
Jun, 23
Fast motion detection from airborne videos using graphics processing unit
In our previous work, we proposed a joint optical flow and principal component analysis (PCA) approach to improve the performance of optical flow based detection, where PCA is applied on the calculated two-dimensional optical flow image, and motion detection is accomplished by a metric derived from the two eigenvalues. To reduce the computational time when […]
Jun, 23
Hierarchical overlapped tiling
This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to balance communication overhead and redundant computation, and thus has […]
Jun, 23
GPU-accelerated Model Checking of Periodic Self-Suspending Real-Time Tasks
Efficient model checking is important in order to make this type of software verification useful for systems that are complex in their structure. If a system is too large or complex then model checking does not simply scale, i.e., it could take too much time to verify the system. This is one strong argument for […]
Jun, 23
Bacon: A GPU Programming System With Just in Time Specialization
This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly […]
Jun, 23
Parallel Neural Network Training with OpenCL
This paper describes the parallelization of neural network training algorithms on heterogeneous architectures with graphical processing units (GPU). The algorithms used for training are particle swarm optimization and backpropagation. Parallel versions of both methods are presented and speedup results are given as compared to the sequential version. The efficiency of parallel training is investigated in […]
Jun, 21
High performance implementation of hydrodynamic interactions and applications with the sub-cellular element method
An O(N^2) algorithm for computing hydrodynamic interaction (HI) in Brownian dynamics (BD) simulation has been implemented. A CPU and a GPU versions have been build, with the GPU one being tuned for performance, up to 40% of the maximum peak performance being obtained. The implementation was validated through simulations of diffusion polymers and comparisons of […]
Jun, 21
Evaluating the impact of reordering unstructured meshes on the performance of finite volume GPU solvers
In this work, we study the impact of renumbering the cells of unstructured triangular finite volume meshes on the performance of CUDA implementations of several finite volume schemes to simulate two-layer shallow water systems. We have used several numerical schemes with different demands of computational power whose CUDA implementations exploit the texture and L1 cache […]
Jun, 21
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform
Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. Results: We present CUSHAW, a parallelized […]