Posts
Oct, 24
A Parallel PSO Algorithm for a Watermarking Application on a GPU
In this paper, a research about the usability, advantages and disadvantages of using Compute Unified Device Architecture (CUDA) is presented, implementing an algorithm based on populations called Particle Swarm Optimization (PSO) [5]. In order to test the performance of the proposed algorithm, a hide watermark image application is put into practice. The PSO is used […]
Oct, 22
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2^18) processors. We present error analysis and scientific application results from a series of more than ten 69 […]
Oct, 22
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase flows are widely used in many practical applications in industry, such as oil industry, chemical and thermal engineering, bioengineering and medicine. Especially flows in tubes with granular layer. Multiphase flows in inclined tubes are poorly studied. Numerical study of multiphase flows in inclined tubes was performed. Cases of clear tube and tube with granular […]
Oct, 22
SIMD Parallel Gibbs Sampling of Probabilistic Directed Acyclic Graphs
We present a single-chain parallelization strategy for Gibbs sampling of probabilistic Directed Acyclic Graphs, where contributions from child nodes to the conditional posterior distribution of a given node are calculated concurrently. For statistical models with many independent observations, such parallelism takes a Single-Instruction-Multiple-Data form, and can be efficiently implemented using multicore parallelization and vector instructions […]
Oct, 22
Massively parallel approximate Gaussian process regression
We explore how the big-three computing paradigms — symmetric multi-processor (SMC), graphical processing units (GPUs), and cluster computing — can together be brought to bare on large-data Gaussian processes (GP) regression problems via a careful implementation of a newly developed local approximation scheme. Our methodological contribution focuses primarily on GPU computation, as this requires the […]
Oct, 22
Fingerprint Local Invariant Feature Extraction on GPU with CUDA
Driven from its uniqueness, immutability, acceptability, and low cost, fingerprint is in a forefront between biometric traits. Recently, the GPU has been considered as a promising parallel processing technology due to its high performance computing, commodity, and availability. Fingerprint authentication is keep growing, and includes the deployment of many image processing and computer vision algorithms. […]
Oct, 21
QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library […]
Oct, 21
An OpenCL-based Implementation of H.264 Encoder
We present an accelerated implementation of high-speed video stream encoder for the H.264 digital video codec standard. Based on the parallel processing techniques with GPU’s, we used an OpenCL-based GPU kernel programs. We achieved a high-level CPU-GPU interoperability, through making CPU perform all input/output operations and overall stream control, while GPU does the core encoding […]
Oct, 21
Solving Multiple Queries through a Permutation Index in GPU
Query-by-content by means of similarity search is a fundamental operation for applications that deal with multimedia data. For this kind of query it is meaningless to look for elements exactly equal to the one given as query. Instead, we need to measure dissimilarity between the query object and each database object. The metric space model […]
Oct, 21
Concurrent kernel execution on Graphic Processing Units
General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). […]
Oct, 21
Moim: A Multi-GPU MapReduce Framework
MapReduce greatly decrease the complexity of developing applications for parallel data processing. To considerably improve the performance of MapReduce applications, we design a new MapReduce framework, called Moim, which 1) effectively utilizes both CPUs and GPUs (general purpose Graphics Processing Units), 2) overlaps CPU and GPU computations, 3) enhances load balancing in the map and […]
Oct, 21
Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale […]

