5593

Posts

Sep, 9

Data classification for artificial intelligence construct training to aid in network incident identification using network telescope data

This paper considers the complexities involved in obtaining training data for use by artificial intelligence constructs to identify potential network incidents using passive network telescope data. While a large amount of data obtained from network telescopes exists, this data is not currently marked for known incidents. Problems related to this marking process include the accuracy […]
Sep, 9

A stream-computing extension to OpenMP

This paper introduces an extension to OpenMP3.0 enabling stream programming with minimal, incremental additions that seamlessly integrate into the current specification. The stream programming model decomposes programs into tasks and explicits the flow of data among them, thus exposing data, task and pipeline parallelism. It helps the programmers to express concurrency and data locality properties, […]
Sep, 9

CUDACS: securing the cloud with CUDA-enabled secure virtualization

While on the one hand unresolved security issues pose a barrier to the widespread adoption of cloud computing technologies, on the other hand the computing capabilities of even commodity HW are boosting, in particular thanks to the adoption of *-core technologies. For instance, the Nvidia Compute Unified Device Architecture (CUDA) technology is increasingly available on […]
Sep, 9

KAdvice: infering synchronization patterns from an existing codebase

Operating system kernels are complex software systems. The kernels of todays mainstream OSs, such as Linux or Windows, are composed from a number of modules, which contain code and data. Even when providing synchronous interfaces (APIs) to the programmer, large portions of the OS kernel operate in an asynchronous manner. Synchronizing access to kernel data […]
Sep, 9

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased […]
Sep, 9

The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing

Fully decentralized systems avoid bottlenecks and single points of failure. Thus, they can provide excellent scalability and very robust operation. The DecentVM is a fully decentralized, distributed virtual machine. Its simplified instruction set allows for a small VM code footprint. Its partitioned global address space (PGAS) memory model helps to easily create a single system […]
Sep, 9

Piccolo: building fast, distributed programs with partitioned tables

Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Piccolo enables efficient application implementations. In particular, applications can specify locality policies to exploit the locality of shared state […]
Sep, 8

Parallel Programming on a Soft-Core Based Multi-core System

Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel […]
Sep, 8

Fast Ultrasound Image Simulation Using the Westervelt Equation

The simulation of ultrasound wave propagation is of high interest in fields as ultrasound system development and therapeutic ultrasound. From a computational point of view the requirements for realistic simulations are immense with processing time reaching even an entire day. In this work we present a framework for fast ultrasound image simulation covering the imaging […]
Sep, 8

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU […]
Sep, 8

A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems

Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in sparse matrix-vector multiplications (SpMV). In this paper we will present a GPU-accelerated scheme for a Conjugate Gradient […]
Sep, 8

Programming Many-Core Chips

This book presents new concepts, techniques and promising programming models for designing software for chips with "many" (hundreds to thousands) processor cores. Given the scale of parallelism inherent to these chips, software designers face new challenges in terms of operating systems, middleware and applications. This will serve as an invaluable, single-source reference to the state-of-the-art […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: