13788
Guillaume Chapuis, Hristo Djidjev
We develop an efficient parallel algorithm for answering shortest-path queries in planar graphs and implement it on a multi-node CPU/GPU clusters. The algorithm uses a divide-and-conquer approach for decomposing the input graph into small and roughly equal subgraphs and constructs a distributed data structure containing shortest distances within each of those subgraphs and between their […]
View View   Download Download (PDF)   
Jens Breitbart
It is expected that the first exascale supercomputer will be deployed within the next 10 years, however both its CPU architecture and programming model are not known yet. Multicore CPUs are not expected to scale to the required number of cores per node, but hybrid multicore CPUs consisting of different kinds of processing elements are […]
View View   Download Download (PDF)   
E. Ketelaer, D. Lukarski
HiFlow3 is a multi-purpose finite element software providing powerful tools for efficient and accurate solution of a wide range of problems modeled by partial differential equations (PDEs). Based on object-oriented concepts and the full capabilities of C++ the HiFlow3 project follows a modular and generic approach for building efficient parallel numerical solvers. It provides highly […]
Annette Bieniusa, Johannes Eickhold, Thomas Fuhrmann
Fully decentralized systems avoid bottlenecks and single points of failure. Thus, they can provide excellent scalability and very robust operation. The DecentVM is a fully decentralized, distributed virtual machine. Its simplified instruction set allows for a small VM code footprint. Its partitioned global address space (PGAS) memory model helps to easily create a single system […]
View View   Download Download (PDF)   
Javier Bueno, Xavier Martorell, Juan Jose Costa, Toni Cortes, Eduard Ayguade, Guansong Zhang, Christopher Barton, Raul Silvera
Software Distributed Shared Memory (SDSM) systems offer a convenient way to run applications developed for shared memory systems on distributed systems with no changes to them. However, since SDSM systems add an extra layer of abstraction to the memory hierarchy, applications may suffer performance problems when running on top of them. Our main research interest […]
View View   Download Download (PDF)   
Michela Becchi, Surendra Byna, Srihari Cadambi, Srimat Chakradhar
In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (GPU). The CPU and GPU are connected by a non-coherent interconnect such as PCI-E, and as such do not have shared memory. Heterogeneous platforms available today such as […]
Shih Hsiang Lo, Yeh Ching Chung, Fang Ping Pai
Data distribution management (DDM) aims to reduce the transmission of irrelevant data between High Level Architecture (HLA) compliant simulators by taking their interesting regions into account (i.e. region matching). In a large-scale simulation, computation intensive region matching would have a direct impact on the simulation performance. To deal with the high computation cost of region […]
View View   Download Download (PDF)   
Bratin Saha, Xiaocheng Zhou, Hu Chen, Ying Gao, Shoumeng Yan, Mohan Rajagopalan, Jesse Fang, Peinan Zhang, Ronny Ronen, Avi Mendelson
The client computing platform is moving towards a heterogeneous architecture consisting of a combination of cores focused on scalar performance, and a set of throughput-oriented cores. The throughput oriented cores (e.g. a GPU) may be connected over both coherent and non-coherent interconnects, and have different ISAs. This paper describes a programming model for such heterogeneous […]
Phuong H. Ha, Philippas Tsigas, Otto J. Anshus
This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t -resilient read-modify-write objects for a general model of recent […]
View View   Download Download (PDF)   

* * *

* * *

Follow us on Twitter

HGPU group

1748 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: