Linchuan Chen
Because of the bottleneck in the increase of clock frequency, multi-cores emerged as a way of improving the overall performance of CPUs. In the recent decade, many-cores begin to play a more and more important role in scientific computing. The highly cost-effective nature of many-cores makes them extremely suitable for data-intensive computations. Specifically, many-cores are […]
View View   Download Download (PDF)   
Florence Monna
More and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which […]
View View   Download Download (PDF)   
A. Tarun Beri, B. Sorav Bansal, C. Subodh Kumar
We study work-stealing based scheduling on a cluster of nodes with CPUs and GPUs. In particular, we evaluate locality aware scheduling in the context of distributed shared memory style programming, where the user is oblivious to data placement. Our runtime maintains a distributed map of data resident on various nodes and uses it to estimate […]
View View   Download Download (PDF)   
Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst, Pierre-Andre Wacrenier
Computing platforms are now extremely complex providing an increasing number of CPUs and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we tackle the task granularity problem and we propose aggregating several CPUs in order to execute larger parallel tasks and thus find a better equilibrium between the […]
View View   Download Download (PDF)   
Glenn A. Elliott
Self-driving cars, once constrained to closed test tracks, are beginning to drive alongside human drivers on public roads. Loss of life or property may result if the computing systems of automated vehicles fail to respond to events at the right moment. We call such systems that must satisfy precise timing constraints "real-time systems." Since the […]
View View   Download Download (PDF)   
Yujie Zhang, Jiabin Yuan, Xiangwen Lu, Xingfang Zhao
General Purpose Graphics Units (GPGPUS) have seen a tremendous rise in scientific computing application. To fully utilize the powerful parallel computing ability of GPU, and combine the isolation characteristic of virtualization, a GPU virtualization method that supports dynamic scheduling and multi-user concurrency is proposed. For multi-task of GPU general computing programs in virtualization environment, the […]
View View   Download Download (PDF)   
Peng Zhang, Yuxiang Gao, Meikang Qiu
The rapidly-changing computer architectures, though improving the performance of computers, have been challenging the programming environments for efficiently harnessing the potential of novel architectures. In this area, though the high-density multi-GPU architecture enabled unparalleled performance advantage of dense GPUs in a single server, it has increased the difficulty for scheduling diversified and dependent tasks. We […]
View View   Download Download (PDF)   
Yuki Tsujita, Toshio Endo
Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which […]
View View   Download Download (PDF)   
Hao Wu, Daniel Lohmann, Wolfgang Schroder-Preikschat
In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to […]
View View   Download Download (PDF)   
Jihye Kwon, Kang-Wook Kim, Sangyoun Paik, Jihwa Lee, Chang-Gun Lee
Past researches on multicore scheduling assume that a computational unit has already been parallelized into a prefixed number of threads. However, with recent technologies such as OpenCL, a computational unit can be parallelized in many different ways with runtime selectable numbers of threads. This paper proposes an optimal algorithm for parallelizing and scheduling a set […]
View View   Download Download (PDF)   
Ville Korhonen
Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the […]
View View   Download Download (PDF)   
Chih-Sheng Lin
Recently, a hybrid system consisting of general-purpose processors (CPU) and accelerators such as graphic processing units (GPUs) have become mainstream system architecture design for achieving high performance and power efficiency. However, this growing trend is forcing programmers to address issues and challenges in adapting legacy serial programs into heterogeneous parallel programs. To alleviate the burden […]
View View   Download Download (PDF)   
Page 1 of 1212345...10...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1666 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

338 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: