5587

Posts

Sep, 9

Piccolo: building fast, distributed programs with partitioned tables

Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Piccolo enables efficient application implementations. In particular, applications can specify locality policies to exploit the locality of shared state […]
Sep, 8

Parallel Programming on a Soft-Core Based Multi-core System

Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel […]
Sep, 8

Fast Ultrasound Image Simulation Using the Westervelt Equation

The simulation of ultrasound wave propagation is of high interest in fields as ultrasound system development and therapeutic ultrasound. From a computational point of view the requirements for realistic simulations are immense with processing time reaching even an entire day. In this work we present a framework for fast ultrasound image simulation covering the imaging […]
Sep, 8

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU […]
Sep, 8

A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems

Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in sparse matrix-vector multiplications (SpMV). In this paper we will present a GPU-accelerated scheme for a Conjugate Gradient […]
Sep, 8

Programming Many-Core Chips

This book presents new concepts, techniques and promising programming models for designing software for chips with "many" (hundreds to thousands) processor cores. Given the scale of parallelism inherent to these chips, software designers face new challenges in terms of operating systems, middleware and applications. This will serve as an invaluable, single-source reference to the state-of-the-art […]
Sep, 8

GPU Computation in Bioinspired Algorithms: A Review

Bioinspired methods usually need a high amount of computational resources. For this reason, parallelization is an interesting alternative in order to decrease the execution time and to provide accurate results. In this sense, recently there has been a growing interest in developing parallel algorithms using graphic processing units (GPU) also refered as GPU computation. Advances […]
Sep, 8

Towards GPGPU Assisted Computing in Virtualized Environments

General Purpose Computation on Graphics Processing Units (GPGPU) makes it possible to use the massive computing power of modern graphics cards for generic high-performance computing. However, the new virtualization technologies will typically not support high-performance graphics cards and as a consequence GPGPU resources can not be used in typical virtualization setups. In this paper we […]
Sep, 8

Implementing Independent Component Analysis in General-Purpose GPU Architectures

New computational architectures, such as multi-core processors and graphics processing units (GPUs), pose challenges to application developers. Although in the case of general-purpose GPU programming, environments and toolkits such as CUDA and OpenCL have simplified application development, different ways of thinking about memory access, storage, and program execution are required. This paper presents a strategy […]
Sep, 8

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as […]
Sep, 8

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

The growth in multicore CPUs and the emergence of powerful manycore GPUs has led to proliferation of parallel applications. Many applications are not straight forward to be parallelized. This paper examines the performance of a parallelized implementation for calculating measurements of Complex Networks. We present an algorithm for calculating complex networks topological feature clustering coefficient, […]
Sep, 7

Pegasus: coordinated scheduling for virtualized accelerator-based systems

Heterogeneous multi-cores–platforms comprised of both general purpose and accelerator cores–are becoming increasingly common. While applications wish to freely utilize all cores present on such platforms, operating systems continue to view accelerators as specialized devices. The Pegasus system described in this paper uses an alternative approach that offers a uniform resource usage model for all cores […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: