Posts
Sep, 11
Software-based branch predication for AMD GPUs
Branch predication is a program transformation technique that combines instructions of multiple branches of an if statement into a straight-line sequence and associates each instruction of the sequence with a predicate. The branch predication improves the execution of branch statements on processors that support predicated execution of instruction, e.g., Intel IA-64, because such transformation improves […]
Sep, 11
Solving diffractive optics problems using graphics processing units
Techniques for applying graphics processing units (GPU) to the general-purpose nongraphics computations proposed in recent years by the companies ATI (AMD FireStream, 2006) and NVIDIA (CUDA: Compute Unified Device Architecture, 2007) have given an impetus to developing algorithms and software packages for solving problems of diffractive optics with the aid of the GPU. The computations […]
Sep, 9
Enabling multiple accelerator acceleration for Java/OpenMP
While using a single GPU is fairly easy, using multiple CPUs and GPUs potentially distributed over multiple machines is hard because data needs to be kept consistent using message exchange and the load needs to be balanced. We propose (1) an array package that provides partitioned and replicated arrays and (2) a compute-device library to […]
Sep, 9
Heterogeneous multicore parallel programming for graphics processing units
Hybrid parallel multicore architectures based on graphics processing units (GPUs) can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a […]
Sep, 9
Beyond programmable shading (parts I and II)
There are strong indications that the future of interactive graphics programming is a more flexible model than today’s OpenGL/Direct3D pipelines. Graphics developers need a basic understanding of how to combine emerging parallel programming techniques and more flexible graphics processors with the traditional interactive rendering pipeline. As the first in a series, this course introduces the […]
Sep, 9
Data classification for artificial intelligence construct training to aid in network incident identification using network telescope data
This paper considers the complexities involved in obtaining training data for use by artificial intelligence constructs to identify potential network incidents using passive network telescope data. While a large amount of data obtained from network telescopes exists, this data is not currently marked for known incidents. Problems related to this marking process include the accuracy […]
Sep, 9
A stream-computing extension to OpenMP
This paper introduces an extension to OpenMP3.0 enabling stream programming with minimal, incremental additions that seamlessly integrate into the current specification. The stream programming model decomposes programs into tasks and explicits the flow of data among them, thus exposing data, task and pipeline parallelism. It helps the programmers to express concurrency and data locality properties, […]
Sep, 9
CUDACS: securing the cloud with CUDA-enabled secure virtualization
While on the one hand unresolved security issues pose a barrier to the widespread adoption of cloud computing technologies, on the other hand the computing capabilities of even commodity HW are boosting, in particular thanks to the adoption of *-core technologies. For instance, the Nvidia Compute Unified Device Architecture (CUDA) technology is increasingly available on […]
Sep, 9
KAdvice: infering synchronization patterns from an existing codebase
Operating system kernels are complex software systems. The kernels of todays mainstream OSs, such as Linux or Windows, are composed from a number of modules, which contain code and data. Even when providing synchronous interfaces (APIs) to the programmer, large portions of the OS kernel operate in an asynchronous manner. Synchronizing access to kernel data […]
Sep, 9
Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems
Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased […]
Sep, 9
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing
Fully decentralized systems avoid bottlenecks and single points of failure. Thus, they can provide excellent scalability and very robust operation. The DecentVM is a fully decentralized, distributed virtual machine. Its simplified instruction set allows for a small VM code footprint. Its partitioned global address space (PGAS) memory model helps to easily create a single system […]
Sep, 9
Piccolo: building fast, distributed programs with partitioned tables
Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Piccolo enables efficient application implementations. In particular, applications can specify locality policies to exploit the locality of shared state […]