Posts
Oct, 15
Towards Utilizing Remote GPUs for CUDA Program Execution
The modern CPU has been designed to accelerate serial processing as much as possible. Recently, GPUs have been exploited to solve large parallelizable problems. As fast as a GPU is for general purpose massively parallel computing, some problems require an even larger scale of parallelism and pipelining. However, it has been difficult to scale algorithms […]
Oct, 15
Functional High Performance Financial IT
The world of finance faces the computational performance challenge of massively expanding data volumes, extreme response time requirements, and compute-intensive complex (risk) analyses. Simultaneously, new international regulatory rules require considerably more transparency and external auditability of financial institutions, including their software systems. To top it off, increased product variety and customisation necessitates shorter software development […]
Oct, 15
Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems
Graphics processors (GPUs) have emerged as an important platform for general purpose computing. GPUs offer a large number of parallel cores and have access to high memory bandwidth; however, data structure layouts in GPU memory often lead to suboptimal performance for programs designed with a CPU memory interface-or no particular memory interface at all!-in mind. […]
Oct, 15
Effects of compression on data intensive algorithms
In recent years, the gap between bandwidth and computational throughput has become a major challenge in high performance computing (HPC). Data intensive algorithms are particularly affected. by the limitations of I/O bandwidth and latency. In this thesis project, data compression is explored so that fewer bytes need to be read from disk. The computational capabilities […]
Oct, 15
Bandwidth Reduction Through Multithreaded Compression of Seismic Images
One of the main challenges of modern computer systems is to overcome the ever more prominent limitations of disk I/O and memory bandwidth, which today are thousands-fold slower than computational speeds. In this paper, we investigate reducing memory bandwidth and overall I/O and memory access times by using multithreaded compression and decompression of large datasets. […]
Oct, 15
Speeding up the MATLAB complex networks package using graphic processors
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to […]
Oct, 15
GPU fluids in production: a compiler approach to parallelism
Fluid effects in films require the utmost flexibility, from manipulating a small lick of flame to art-directing a huge tidal wave. While fluid solvers are increasingly making use of GPU hardware, one of the biggest challenges is taking advantage of this technology without compromising on either adaptability or performance. We developed the Jet toolset comprised […]
Oct, 15
Accelerating code on multi-cores with FastFlow
FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code […]
Oct, 15
Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards
In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, it is hard to efficiently […]
Oct, 14
An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming
Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider […]
Oct, 14
Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems
General-purpose graphical processing units (GPGPUs) have transformed high-performance computing over the past decade. Making great computational power available with reduced cost and power consumption overheads, heterogeneous CPU-GPU-equipped systems have helped to make possible the emerging class of exascale data-intensive applications. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of […]
Oct, 14
CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization
As the computational power of GPUs continues to scale with Moore’s Law, an increasing number of applications are becoming limited by memory bandwidth. We propose an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories. Separate DMA warps improve memory bandwidth utilization by better exploiting available […]