Posts
Sep, 29
XBOOLE-CUDA: Fast Boolean Operations on the GPU
The Boolean domain faces us with the exponential complexity of Boolean functions and the technological progress in micro- and nano-electronics allows increasing numbers of Boolean variables. This requires very powerful Boolean computations. The progress in the performance of Graphics Processing Units (GPUs) and the possibility to utilize the GPU to solve tasks of many application […]
Sep, 28
An hybrid AES-256-GCM implementation for NEON CPU & CUDA GPU
This paper describes & evaluates a fast, hybrid implementation of the Advanced Encryption Standard with 256 bit keys (AES-256) block encryption in Galois/Counter Mode (GCM). The implementation is bit-compatible with the implemented standard in both the OpenSSL and Crypto++ libraries, while significantly (up to three times) faster for large amount of data. In this implementation, […]
Sep, 28
A Study of the Potential of Locality-Aware Thread Scheduling for GPUs
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threads, effectively removing ordering constraints. Still, parallel architectures such as the graphics processing unit (GPU) do not exploit the potential of data-locality enabled by this independence. Therefore, programmers are required to manually perform data-locality optimisations such as memory coalescing or […]
Sep, 28
High-performance Implementations and Large-scale Validation of the Link-wise Artificial Compressibility Method
The link-wise artificial compressibility method (LW-ACM) is a recent formulation of the artificial compressibility method for solving the incompressible Navier-Stokes equations. Two implementations of the LW-ACM in three dimensions on CUDA enabled GPUs are described. The first one is a modified version of a state-of-the-art CUDA implementation of the lattice Boltzmann method (LBM), showing that […]
Sep, 28
NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model
The broad adoption of accelerators boosts the interest in accelerator programming. Accelerators such as GPGPUs are optimized for throughput and offer high GFLOPS and memory bandwidth. CUDA has been adopted quite rapidly but it is proprietary and only applicable to GPUs, and the difficulty in writing efficient CUDA code has kindled the necessity to create […]
Sep, 28
Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison
Phylogenetic inference is used to derive a "tree of life" for a collection of species whose DNA sequences are known. While several software packages have already been developed to take advantage of GPUs to accelerate phylogenetic inference, they typically require significant changes to the original code, constraining code maintenance. Recently, the OpenACC API was proposed […]
Sep, 25
An open source finite-difference time-domain solver for room acoustics using graphics processing units
Wave based simulation methods have been utilized to numerically estimate wave propagation in domains where low-frequency wave effects dominate the response. Finite-difference time-domain (FDTD) methods are increasingly useful for such problems, but they require massive spatial oversampling to increase the bandwidth of the simulation, which leads to significant computational expense. The advantage of explicit time-stepping […]
Sep, 25
Study on semi-global matching algorithm extended for multi baseline matching and parallel processing method based on GPU
This paper extended semi-global matching algorithm into multi baseline matching to improve matching reliability, especially studies kernel function optimization strategies and GPU threads’ executing scheme of matching cost cube computing and aggregating, and realized its fine granularity parallel processing based on GPU. The experiment results using three UCD aerial images based on Tesla C2050 GPU […]
Sep, 25
Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL
GPU (Graphic processing system) enhance the performance of the performance of the computing field due to its hundreds of cores in parallel. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) programming models are included in GPU. The advantage of these two programming models in GPU is that developers don’t have to understand any […]
Sep, 25
MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs
Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets because they have to (i) hold the whole dataset in memory and/or (ii) perform a very large number of kernel value computation. In this paper, we propose a scheme […]
Sep, 25
Scalability Analysis of Parallel Algorithms on GPU Clusters
Scalability is an important concept in the domain of parallel computing. Since Graphics Processing Unit (GPU) clusters are and will be widely utilized in high performance computing platforms, we investigate the factors influencing the scalability for combinations of parallel algorithms (PA) and GPU clusters (GC).We present a scalability model for combination PA-GC and then validate […]
Sep, 24
Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit
The vast majority of problems faced by bioinformatics are very complex and time consuming. They require the use of modern high-performance computational systems and the development of algorithms for such system. Heterogeneous computing systems which include graphics processing unit (GPU) occupy a separate niche. Such systems allow to accelerate solving of some task significantly. The […]