Fumihiko Ino, Yosuke Oka, Kenichi Hagihara
The emergence of compute unified device architecture (CUDA), which has relieved application developers from having to understand complex graphics pipelines, has made the graphics processing unit (GPU) useful not only for graphics applications but also for general applications. In this paper, we present a cycle sharing system named GPU grid, which exploits idle GPU cycles […]
View View   Download Download (PDF)   
Pedro Alonso, Manuel F. Dolz, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Orti
The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate parallel executions with minimal developer intervention has been introduced in recent years to tackle the programmability issue while maintaining, or even improving, performance. In this paper, […]
View View   Download Download (PDF)   
Li Zhen, Qiuxiao Gang, Guo Gang, Chen Bin
The graphic processing unit (GPU) gets strong computing ability with relatively low energy and money consumption, it has been widely used in the field of large-scale simulation and computation. Among which the CPU-GPU heterogeneous collaborative computing model has become an effective ways to solve the simulation performance of large-scale artificial society. But there are lots […]
View View   Download Download (PDF)   
Glenn A. Elliott, James H. Anderson
Motivated by computational capacity and power efficiency, techniques for integrating graphics processing units (GPUs) into real-time systems have become an active area of research. While much of this work has focused on single-GPU systems, multiple GPUs may be used for further benefits. Similar to CPUs in multiprocessor systems, GPUs in multi-GPU systems may be managed […]
View View   Download Download (PDF)   
Li Tian, Fugen Zhou, Cai Meng
We address the problem that multicore DSP system doesn’t support OpenCL programming. We designed compiler and proposed a runtime framework for TI multicore DSP, by which OpenCL parallel program could take advantage of multicore computing resource. Firstly, we make use of the LLVM and Clang compiler front-end to achieve source-to-source translation and in the next […]
View View   Download Download (PDF)   
Robert F. Lyerly
The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types […]
View View   Download Download (PDF)   
Martin Tillenius
In computational science, making efficient use of modern multicore based computer hardware is necessary in order to deal with complex real-life application problems. However, with increased hardware complexity, the cost in man hours of writing and re-writing software to adapt to evolving computer systems is becoming prohibitive. Task based parallel programming models aim to allow […]
Jinglian Wang, Bin Gong, Hong Liu, Shaohui Li, Juan Yi
This work presents the novel parallel evolutionary algorithm (EA) for task scheduling in distributed heterogeneous computing and grid environments, NP-hard problems with capital relevance in distributed computing. Parallelization of the biologically inspired heuristics is hierarchically designed and integrates with the two traditional parallel models (master-slave models and island models). The method has been specifically implemented […]
View View   Download Download (PDF)   
Nguyen Quang-Hung, Le Thanh Tan, Chiem Thach Phat, Nam Thoai
In this paper, we consider power-aware task scheduling (PATS) in HPC clouds. Users request virtual machines (VMs) to execute their tasks. Each task is executed on one single VM, and requires a fixed number of cores (i.e., processors), computing power (million instructions per second – MIPS) of each core, a fixed start time and non-preemption […]
View View   Download Download (PDF)   
Jon Currey, Adam Eversole, Christopher J. Rossbach
Dataflow execution engines such as MapReduce, DryadLINQ and PTask have enjoyed success because they simplify development for a class of important parallel applications. Expressing the computation as a dataflow graph allows the runtime, and not the programmer, to own problems such as synchronization, data movement and scheduling – leveraging dynamic information to inform strategy and […]
View View   Download Download (PDF)   
Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, Jean-Francois Mehaut
Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far rely on task-based runtimes that abstract […]
View View   Download Download (PDF)   
Li Tian, Cai Meng, Fugen Zhou
This paper addresses the problem that multiple DSP system doesn’t support OpenCL programming. With the compiler, runtime and the kernel scheduler proposed, an OpenCL application becomes portable not only between multiple CPU and GPU, but also between embedded multiple DSP systems. Firstly, the LLVM compiler was imported for source-to-source translation in which the translated source […]
View View   Download Download (PDF)   
Page 1 of 1112345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

147 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1229 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: