Posts
Jan, 30
Many-threaded Differential Evolution on the GPU
Differential evolution (DE) is an efficient populational meta-heuristic optimization algorithm that has been applied to many difficult real world problems. Due to the relative simplicity of its operations and real encoded data structures, it is very suitable for a parallel implementation on multicore systems and on the GPUs that nowadays reach peak performance of hundreds […]
Jan, 30
Scheduling (ir)regular applications on heterogeneous platforms
Current computational platforms have become continuously more and more heterogeneous and parallel over the last years, as a consequence of incorporating accelerators whose architectures are parallel and different from the CPU. As a result, several frameworks were developed to aid to program these platforms mainly targeting better productivity ratios. In this context, GAMA framework is […]
Jan, 29
GPUDet: A Deterministic GPU Architecture
Nondeterminism is a key challenge in developing multithreaded applications. Even with the same input, each execution of a multithreaded program may produce a different output. This behavior complicates debugging and limits one’s ability to test for correctness. This non-reproducibility situation is aggravated on massively parallel architectures like graphics processing units (GPUs) with thousands of concurrent […]
Jan, 28
Efficient Implementation of MrBayes on multi-GPU
MrBayes, using Metropolis coupled Markov chain Monte Carlo [MCMCMC, or (MC)^3 for short], is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, now the (MC)^3 Bayesian algorithm and its improved and parallel versions are all not fast enough for Biologists to analyze massive real-world DNA data. […]
Jan, 28
A dataflow-like programming model for future hybrid clusters
It is expected that the first exascale supercomputer will be deployed within the next 10 years, however both its CPU architecture and programming model are not known yet. Multicore CPUs are not expected to scale to the required number of cores per node, but hybrid multicore CPUs consisting of different kinds of processing elements are […]
Jan, 28
Exploring Different Automata Representations for Efficient Regular Expression Matching on GPUs
Regular expression matching is a central task in several networking (and search) applications and has been accelerated on a variety of parallel architectures. All solutions are based on finite automata (either in deterministic or non-deterministic form), and mostly focus on effective memory representations for such automata. Recently, a handful of work has proposed efficient regular […]
Jan, 28
Warped Register File: A Power Efficient Register File for GPGPUs
General purpose graphics processing units (GPGPUs) have the ability to execute hundreds of concurrent threads. To support massive parallelism GPGPUs provide a very large register file, even larger than a cache, to hold the state of each thread. As technology scales, the leakage power consumption of the SRAM cells is getting worse making the register […]
Jan, 28
Reaction-diffusion model Monte Carlo simulations on the GPU
We created an efficient algorithm suitable for graphics processing units (GPUs) to perform Monte Carlo simulations of a subset of reaction-diffusion models. The algorithm uses techniques that are specific to GPU programming, and combines these with the multispin technique known from CPU programming to create one of the fastest algorithms for reaction-diffusion models. As an […]
Jan, 26
GPUfs: Integrating a File System with GPUs
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. To make GPUs easier to program and improve their integration with operating systems, we propose making the host’s file system directly accessible to GPU code. GPUfs provides a POSIX-like API for GPU programs, exploits GPU parallelism for efficiency, and […]
Jan, 26
Advanced Trends of Heterogeneous Computing with CPU-GPU Integration: Comparative Study
Over the last decades parallel-distributed computing becomes most popular than traditional centralized computing. In distributed computing performance up-gradation is achieved by distributing workloads across the participating nodes. One of the most important factors for improving the performance of this type of system is to reduce average and standard deviation of job response time. Runtime insertion […]
Jan, 26
Selection algorithm of graphic accelerators in heterogeneous cluster for optimization computing
The paper highlights the question of the optimal GPU computers selection for kernels in OpenCL when they are starting on heterogeneous clusters where different types of GPU are used. The authors propose optimal GPU selection algorithm that helps to get the best efficiency while program execution using GPU.
Jan, 26
Autotuning, Code Generation and Optimizing Compiler Technology for GPUs
Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance potential. Application developers have a tedious task in developing GPU software by correctly identifying parallel computation and optimizing placement of data for the parallel processors in such architectures. Further, code optimized for one architecture may not perform well on different generations of even the […]