13254
Jihye Kwon, Kang-Wook Kim, Sangyoun Paik, Jihwa Lee, Chang-Gun Lee
Past researches on multicore scheduling assume that a computational unit has already been parallelized into a prefixed number of threads. However, with recent technologies such as OpenCL, a computational unit can be parallelized in many different ways with runtime selectable numbers of threads. This paper proposes an optimal algorithm for parallelizing and scheduling a set […]
View View   Download Download (PDF)   
Ville Korhonen
Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the […]
View View   Download Download (PDF)   
Chih-Sheng Lin
Recently, a hybrid system consisting of general-purpose processors (CPU) and accelerators such as graphic processing units (GPUs) have become mainstream system architecture design for achieving high performance and power efficiency. However, this growing trend is forcing programmers to address issues and challenges in adapting legacy serial programs into heterogeneous parallel programs. To alleviate the burden […]
View View   Download Download (PDF)   
Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li
As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well […]
View View   Download Download (PDF)   
Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threads, effectively removing ordering constraints. Still, parallel architectures such as the graphics processing unit (GPU) do not exploit the potential of data-locality enabled by this independence. Therefore, programmers are required to manually perform data-locality optimisations such as memory coalescing or […]
View View   Download Download (PDF)   
Yuan Wen, Zheng Wang, Michael F.P. O'Boyle
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms […]
View View   Download Download (PDF)   
Fumihiko Ino, Yosuke Oka, Kenichi Hagihara
The emergence of compute unified device architecture (CUDA), which has relieved application developers from having to understand complex graphics pipelines, has made the graphics processing unit (GPU) useful not only for graphics applications but also for general applications. In this paper, we present a cycle sharing system named GPU grid, which exploits idle GPU cycles […]
View View   Download Download (PDF)   
Pedro Alonso, Manuel F. Dolz, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Orti
The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate parallel executions with minimal developer intervention has been introduced in recent years to tackle the programmability issue while maintaining, or even improving, performance. In this paper, […]
View View   Download Download (PDF)   
Li Zhen, Qiuxiao Gang, Guo Gang, Chen Bin
The graphic processing unit (GPU) gets strong computing ability with relatively low energy and money consumption, it has been widely used in the field of large-scale simulation and computation. Among which the CPU-GPU heterogeneous collaborative computing model has become an effective ways to solve the simulation performance of large-scale artificial society. But there are lots […]
View View   Download Download (PDF)   
Glenn A. Elliott, James H. Anderson
Motivated by computational capacity and power efficiency, techniques for integrating graphics processing units (GPUs) into real-time systems have become an active area of research. While much of this work has focused on single-GPU systems, multiple GPUs may be used for further benefits. Similar to CPUs in multiprocessor systems, GPUs in multi-GPU systems may be managed […]
View View   Download Download (PDF)   
Li Tian, Fugen Zhou, Cai Meng
We address the problem that multicore DSP system doesn’t support OpenCL programming. We designed compiler and proposed a runtime framework for TI multicore DSP, by which OpenCL parallel program could take advantage of multicore computing resource. Firstly, we make use of the LLVM and Clang compiler front-end to achieve source-to-source translation and in the next […]
View View   Download Download (PDF)   
Robert F. Lyerly
The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types […]
View View   Download Download (PDF)   
Page 1 of 1112345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

195 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1334 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: