Nguyen Quang-Hung, Le Thanh Tan, Chiem Thach Phat, Nam Thoai
In this paper, we consider power-aware task scheduling (PATS) in HPC clouds. Users request virtual machines (VMs) to execute their tasks. Each task is executed on one single VM, and requires a fixed number of cores (i.e., processors), computing power (million instructions per second – MIPS) of each core, a fixed start time and non-preemption […]
View View   Download Download (PDF)   
Jon Currey, Adam Eversole, Christopher J. Rossbach
Dataflow execution engines such as MapReduce, DryadLINQ and PTask have enjoyed success because they simplify development for a class of important parallel applications. Expressing the computation as a dataflow graph allows the runtime, and not the programmer, to own problems such as synchronization, data movement and scheduling – leveraging dynamic information to inform strategy and […]
View View   Download Download (PDF)   
Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, Jean-Francois Mehaut
Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far rely on task-based runtimes that abstract […]
View View   Download Download (PDF)   
Li Tian, Cai Meng, Fugen Zhou
This paper addresses the problem that multiple DSP system doesn’t support OpenCL programming. With the compiler, runtime and the kernel scheduler proposed, an OpenCL application becomes portable not only between multiple CPU and GPU, but also between embedded multiple DSP systems. Firstly, the LLVM compiler was imported for source-to-source translation in which the translated source […]
View View   Download Download (PDF)   
R. Mohan, N. P. Gopalan
To get maximum performance on the many-core graphics processors, it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically […]
View View   Download Download (PDF)   
Haeseung Lee, Mohammad Abdullah Al Faruque
GPU architecture has traditionally been used in graphics application because of its enormous computing capability. Moreover, GPU architecture has also been used for general purpose computing in these days. Most of the current scheduling frameworks that are developed to handle GPGPU workload operate sequentially. This is problematic since this sequential approach may not be scalable […]
View View   Download Download (PDF)   
Ryan R. Newton, Eric Holk, Trevor L. McDonell
High-level domain-specific-languages for array processing on the GPU are increasingly common, but to date they run only on a single GPU. We argue that languages will need to target multiple devices, even simultaneous combinations of GPU/GPU and CPU/GPU. Increased flexibility may be key to making these languages more easily deployable and thus widespread. To this […]
Giuseppe Massari, Chiara Caffarri, Patrick Bellasi, William Fornaciari
From Mobile to High-Performance Computing (HPC) systems, performance and energy efficiency are becoming always more challenging requirements. In this regard, heterogeneous systems, made by a general-purpose processor and one or more hardware accelerators, are emerging as affordable solutions. However, the effective exploitation of such platforms requires specific programming languages, like for instance OpenCL, and suitable […]
View View   Download Download (PDF)   
Tarun Beri, Sorav Bansal, Subodh Kumar
We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive […]
View View   Download Download (PDF)   
Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng
By scheduling multiple applications with complementary resource requirements on a smaller number of compute nodes, we aim to improve performance, resource utilization, energy consumption, and energy efficiency simultaneously. In addition to our naive consolidation approach, which already achieves the aforementioned goals, we propose a new energy efficiency-aware (EEA) scheduling policy and compare its performance with […]
View View   Download Download (PDF)   
David Beniamine
High Performance Computing machines use more and more Graphical Processing Units as they are very efficient for homogeneous computation such as matrix operations. However before using these accelerators, one has to transfer data from the processor to them. Such a transfer can be slow. In this report, our aim is to study the impact of […]
View View   Download Download (PDF)   
Dounia Khaldi
This thesis intends to show how to efficiently exploit the parallelism present in applications in order to enjoy the performance benefits that multiprocessors can provide, using a new automatic task parallelization methodology for compilers. The key characteristics we focus on are resource constraints and static scheduling. This methodology includes the techniques required to decompose applications […]
View View   Download Download (PDF)   
Page 1 of 1012345...10...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: