Nguyen Quang-Hung, Le Thanh Tan, Chiem Thach Phat, Nam Thoai
In this paper, we consider power-aware task scheduling (PATS) in HPC clouds. Users request virtual machines (VMs) to execute their tasks. Each task is executed on one single VM, and requires a fixed number of cores (i.e., processors), computing power (million instructions per second – MIPS) of each core, a fixed start time and non-preemption […]
View View   Download Download (PDF)   
S. Rit, M. Vila Oliva, S. Brousmiche, R. Labarbe, D. Sarrut, G. C. Sharp
We propose the Reconstruction Toolkit (RTK,, an open-source toolkit for fast cone-beam CT reconstruction, based on the Insight Toolkit (ITK) and using GPU code extracted from Plastimatch. RTK is developed by an open consortium (see affiliations) under the non-contaminating Apache 2.0 license. The quality of the platform is daily checked with regression tests in […]
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi
High performance FMM is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based FMM provides the flexibility required to process a wide spectrum of particle distributions efficiently on multicore architectures. In this paper, we now show how such an approach can be extended to fully exploit […]
View View   Download Download (PDF)   
Neri Mickael, Denis Mestivier
MOTIVATION: The Stochastic Simulation Algorithm (SSA) has largely diffused in the field of systems biology. This approach needs many realizations for establishing statistical results on the system under study. It is very computationnally demanding, and with the advent of large models this burden is increasing. Hence parallel implementation of SSA are needed to address these […]
View View   Download Download (PDF)   
Rajesh Gandham, Ken Esler, Yongpeng Zhang
We present an efficient, robust and fully GPU-accelerated aggregation-based algebraic multigrid preconditioning technique for the solution of large sparse linear systems. These linear systems arise from the discretization of elliptic PDEs. The method involves two stages, setup and solve. In the setup stage, hierarchical coarse grids are constructed through aggregation of the fine grid nodes. […]
View View   Download Download (PDF)   
Seyong Lee, Dong Li, Jeffrey S. Vetter
Directive-based GPU programming models are gaining momentum, since they transparently relieve programmers from dealing with complexity of low-level GPU programming, which often reflects the underlying architecture. However, too much abstraction in directive models puts a significant burden on programmers for debugging applications and tuning performance. In this paper, we propose a directive-based, interactive program debugging […]
View View   Download Download (PDF)   
M.G.B. Johnson, D. P. Playne, K.A. Hawick
Floating point precision and performance and the ratio of floating point units to integer processing elements on a graphics processing unit accelerator all continue to present complex tradeoffs for optimising core utilisation on modern devices. We investigate various hybrid CPU and GPU combinations using a range of different GPU models occupying different points in this […]
View View   Download Download (PDF)   
Dominik Zurek, Marcin Pietron, Maciej Wielgosz, Kazimierz Wiatr
Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility […]
View View   Download Download (PDF)   
Miguel Branco Palhas
Recent evolution of high performance computing moved towards heterogeneous platforms: multiple devices with different architectures, characteristics and programming models, share application workloads. To aid the programmer to efficiently explore these heterogeneous platforms several frameworks have been under development. These dynamically manage the available computing resources through workload scheduling and data distribution, dealing with the inherent […]
View View   Download Download (PDF)   
Eike Hermann Muller, Robert Scheichl, Benson Muite, Eero Vainikko
Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion (10^12) unknowns the code has to make efficient use of several million […]
View View   Download Download (PDF)   
Sadaf Alam, Ugo Varetto
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to identify factors that influence […]
View View   Download Download (PDF)   
Ping Guo, Liqiang Wang
This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter-architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on […]
View View   Download Download (PDF)   
Page 1 of 912345...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: