Posts
Apr, 16
On optimization techniques for the matrix multiplication on hybrid CPU+GPU platforms
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is analyzed. Basic models of the execution time of the hybrid routine and information obtained during its installation are used to optimize the execution time with a balanced assignation of the computation to the computing components in the heterogeneous system. Satisfactory […]
Apr, 16
Dynamic Instrumentation and Optimization for GPU Applications
Parallel architectures like GPUs are a tantalizing compute fabric for performance-hungry developers. While GPUs enable order-of-magnitude performance increases in many data-parallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically requiring developers to exercise specialized architectural features exposed directly in the programming model. Achieving good performance on GPUs […]
Apr, 16
New Efficient Method To Solve Longest Overlap Region Problem For Noncoding DNA Sequence
With early hardware limitations of the GPU (lack of synchronization primitives and limited memory caching mechanisms)can make GPU-based computation inefficient, and emerging DNA sequence technologies open up more opportunities for molecular biology. This paper presents the issues of parallel implementation of longest overlap region Problem on a multiprocessor GPU using the Compute Unified Device Architecture […]
Apr, 16
A Way For Accelerating The DNA Sequence Reconstruction Problem By CUDA
Traditionally, we usually utilize the method of shotgun to cut a DNA sequence into pieces and we have to reconstruct the original DNA sequence from the pieces, those are widely used method for DNA assembly. Emerging DNA sequence technologies open up more opportunities for molecular biology. This paper introduce a new method to improve the […]
Apr, 14
Fast Burrows Wheeler Compression Using CPU and GPU
In this paper, we present an all-core implementation of Burrows Wheeler Compression algorithm that exploits all computing resources on a system. Our focus is to provide significant benefit to everyday users on common end-to-end applications by exploiting the parallelism of multiple CPU cores and many-core GPU on their machines. The all-core framework is suitable for […]
Apr, 14
Scheduling Dataflow Execution Across Multiple Accelerators
Dataflow execution engines such as MapReduce, DryadLINQ and PTask have enjoyed success because they simplify development for a class of important parallel applications. Expressing the computation as a dataflow graph allows the runtime, and not the programmer, to own problems such as synchronization, data movement and scheduling – leveraging dynamic information to inform strategy and […]
Apr, 14
A First Order Primal-Dual Algorithm for Nonconvex TV^q Regularization
We propose an efficient first order primal-dual method for solving variational problems with nonconvex regularization such as TV^q. It is based on the recent idea in [1] to reformulate an existing primal-dual algorithm for convex optimization using Moreau’s identity. A systematic comparison to recent state of the art algorithms for nonconvex optimization (iteratively reweighted l1 […]
Apr, 14
An Approach to Efficient FEM Simulations on Graphics Processing Units Using CUDA
The paper presents a highly efficient way of simulating the dynamic behavior of deformable objects by means of the finite element method (FEM) with computations performed on Graphics Processing Units (GPU). The presented implementation reduces bottlenecks related to memory accesses by grouping the necessary data per node pairs, in contrast to the classical way done […]
Apr, 14
A New Architecture for Games and Simulations Using GPUs
Multi-thread architectures are the current trends for both PCs (multi-core CPUs and GPUs) and game consoles such as the Microsoft Xbox 360 and Sony Playstation 3. GPUs (Graphics Processing Units) have evolved into extremely powerful and flexible processors, allowing its use for processing different data. This advantage can be used in game development to optimize […]
Apr, 13
Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities
Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth security analysis on GPUs to detect security vulnerabilities. We observe that contemporary, widely-used GPUs, both […]
Apr, 13
GPUdmm: A High-Performance and Memory-Oblivious GPU Architecture Using Dynamic Memory Management
GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPUGPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as […]
Apr, 13
Test-driving Intel Xeon Phi
Based on Intel’s Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few truly many-core CPUs – featuring around 60 fairly powerful cores, two levels of caches, and graphic memory, all interconnected by a very fast ring. Given its promised ease-of-use and high performance, we took Xeon Phi out for a test […]