Posts
Oct, 15
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
This paper proposes a compiler-based programming framework that automatically translates user-written structured grid code into scalable parallel implementation code for GPU-equipped clusters. To enable such automatic translations, we design a small set of declarative constructs that allow the user to express stencil computations in a portable and implicitly parallel manner. Our framework translates the user-written […]
Oct, 15
Operating Systems Challenges for GPU Resource Management
The graphics processing unit (GPU) is becoming a very powerful platform to accelerate graphics and data-parallel compute-intensive applications. It significantly outperforms traditional multi-core processors in performance and energy efficiency. Its application domains also range widely from embedded systems to high-performance computing systems. However, operating systems support is not adequate, lacking models, designs, and implementation efforts […]
Oct, 15
Towards Utilizing Remote GPUs for CUDA Program Execution
The modern CPU has been designed to accelerate serial processing as much as possible. Recently, GPUs have been exploited to solve large parallelizable problems. As fast as a GPU is for general purpose massively parallel computing, some problems require an even larger scale of parallelism and pipelining. However, it has been difficult to scale algorithms […]
Oct, 15
Functional High Performance Financial IT
The world of finance faces the computational performance challenge of massively expanding data volumes, extreme response time requirements, and compute-intensive complex (risk) analyses. Simultaneously, new international regulatory rules require considerably more transparency and external auditability of financial institutions, including their software systems. To top it off, increased product variety and customisation necessitates shorter software development […]
Oct, 15
Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems
Graphics processors (GPUs) have emerged as an important platform for general purpose computing. GPUs offer a large number of parallel cores and have access to high memory bandwidth; however, data structure layouts in GPU memory often lead to suboptimal performance for programs designed with a CPU memory interface-or no particular memory interface at all!-in mind. […]
Oct, 15
Effects of compression on data intensive algorithms
In recent years, the gap between bandwidth and computational throughput has become a major challenge in high performance computing (HPC). Data intensive algorithms are particularly affected. by the limitations of I/O bandwidth and latency. In this thesis project, data compression is explored so that fewer bytes need to be read from disk. The computational capabilities […]
Oct, 15
Bandwidth Reduction Through Multithreaded Compression of Seismic Images
One of the main challenges of modern computer systems is to overcome the ever more prominent limitations of disk I/O and memory bandwidth, which today are thousands-fold slower than computational speeds. In this paper, we investigate reducing memory bandwidth and overall I/O and memory access times by using multithreaded compression and decompression of large datasets. […]
Oct, 15
Speeding up the MATLAB complex networks package using graphic processors
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to […]
Oct, 15
GPU fluids in production: a compiler approach to parallelism
Fluid effects in films require the utmost flexibility, from manipulating a small lick of flame to art-directing a huge tidal wave. While fluid solvers are increasingly making use of GPU hardware, one of the biggest challenges is taking advantage of this technology without compromising on either adaptability or performance. We developed the Jet toolset comprised […]
Oct, 15
Accelerating code on multi-cores with FastFlow
FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code […]
Oct, 15
Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards
In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, it is hard to efficiently […]
Oct, 14
An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming
Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider […]

