Posts
Jan, 31
Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs
The SIMD (Single Instruction-Multiple Data) execution model of Graphics Processing Units (GPUs) allows for many concurrent threads to simultaneously request data from the memory subsystem. This imposes a large bandwidth demand on the memory interfaces at each level. Each level of the memory hierarchy needs to provide enough bandwidth in order to ensure good response […]
Jan, 31
GPU Enhanced Stream-Based Matrix Multiplication
The paper introduces an algorithm which improves the value of the real giga floating point operations per second (GFLOPS) for matrix multiplication algorithm on Graphical Process Unit-GPU by overlapping the data transfers between (CPU) and the device (GPU) with the kernel execution. The input matrices are divided into n sections and the output matrix into […]
Jan, 31
Particle method on GPU
In this article we present a graphics processing unit (GPU) implementation of a particle method for transport equations. More precisely the numerical method under consideration is a remeshed particle method. Not only remeshing particles makes simulations more accurate in flows with strong strain, but it leads to algorithms more regular in term of data structures. […]
Jan, 30
Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework
Inference of protein structure from experimental data is of crucial interest in science, medicine and biotechnology. Low-resolution methods, such as small angle X-ray scattering (SAXS), play a major role in investigating important biological questions regarding the structure of proteins in solution. To infer protein structure from SAXS data, it is necessary to calculate the expected […]
Jan, 30
Faster Algorithms for RNA-folding using the Four-Russians method
The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n^3) time using dynamic programming. Four-Russians is a technique that will reduce the running time for certain dynamic programming algorithms by a factor after a preprocessing step where solutions to all […]
Jan, 30
Many-threaded Differential Evolution on the GPU
Differential evolution (DE) is an efficient populational meta-heuristic optimization algorithm that has been applied to many difficult real world problems. Due to the relative simplicity of its operations and real encoded data structures, it is very suitable for a parallel implementation on multicore systems and on the GPUs that nowadays reach peak performance of hundreds […]
Jan, 30
Scheduling (ir)regular applications on heterogeneous platforms
Current computational platforms have become continuously more and more heterogeneous and parallel over the last years, as a consequence of incorporating accelerators whose architectures are parallel and different from the CPU. As a result, several frameworks were developed to aid to program these platforms mainly targeting better productivity ratios. In this context, GAMA framework is […]
Jan, 29
GPUDet: A Deterministic GPU Architecture
Nondeterminism is a key challenge in developing multithreaded applications. Even with the same input, each execution of a multithreaded program may produce a different output. This behavior complicates debugging and limits one’s ability to test for correctness. This non-reproducibility situation is aggravated on massively parallel architectures like graphics processing units (GPUs) with thousands of concurrent […]
Jan, 28
Efficient Implementation of MrBayes on multi-GPU
MrBayes, using Metropolis coupled Markov chain Monte Carlo [MCMCMC, or (MC)^3 for short], is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, now the (MC)^3 Bayesian algorithm and its improved and parallel versions are all not fast enough for Biologists to analyze massive real-world DNA data. […]
Jan, 28
A dataflow-like programming model for future hybrid clusters
It is expected that the first exascale supercomputer will be deployed within the next 10 years, however both its CPU architecture and programming model are not known yet. Multicore CPUs are not expected to scale to the required number of cores per node, but hybrid multicore CPUs consisting of different kinds of processing elements are […]
Jan, 28
Exploring Different Automata Representations for Efficient Regular Expression Matching on GPUs
Regular expression matching is a central task in several networking (and search) applications and has been accelerated on a variety of parallel architectures. All solutions are based on finite automata (either in deterministic or non-deterministic form), and mostly focus on effective memory representations for such automata. Recently, a handful of work has proposed efficient regular […]
Jan, 28
Warped Register File: A Power Efficient Register File for GPGPUs
General purpose graphics processing units (GPGPUs) have the ability to execute hundreds of concurrent threads. To support massive parallelism GPGPUs provide a very large register file, even larger than a cache, to hold the state of each thread. As technology scales, the leakage power consumption of the SRAM cells is getting worse making the register […]