Sparsh Mittal
Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers […]
View View   Download Download (PDF)   
Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mendel Rosenblum, William J. Dally
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programming called stream programming is needed to target these stream architectures, huge […]
View View   Download Download (PDF)   
Shekhar Borkar
This paper presents the many-core architecture, with hundreds to thousands of small cores, to deliver unprecedented compute performance in an affordable power envelope. We discuss fine grain power management, memory bandwidth, on die networks, and system resiliency for the many-core system.
View View   Download Download (PDF)   
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, Andreas Moshovos
Graphics processors (GPU) offer the promise of more than an order of magnitude speedup over conventional processors for certain non-graphics computations. Because the GPU is often presented as a C-like abstraction (e.g., Nvidia’s CUDA), little is known about the characteristics of the GPU’s architecture beyond what the manufacturer has documented. This work develops a microbechmark […]
View View   Download Download (PDF)   
Kayvon Fatahalian, Mike Houston
As the line between GPUs and CPUs begins to blur, it’s important to understand what makes GPUs tick.
View View   Download Download (PDF)   
Victor Moya, Carlos Gonzalez, Jordi Roca, Agustin Fernandez and Roger Espasa
We present and evaluate the TILA-rin GPU microarchitecture for embedded systems using the ATTILA GPU simulation framework. We use a trace from an execution of the Unreal Tournament 2004 PC game to eval uate and compare the performance of the proposed embedded GPU against a baseline GPU architecture for the PC. We evaluate the different […]
View View   Download Download (PDF)   
Rainer Buchty, Jan-Philipp Weiss (ed.)
The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach.
View View   Download Download (PDF)   
Michael D. Linderman, James Balfour, Teresa H. Meng, William J. Dally
Computer systems are undergoing significant change: to improve performance and efficiency, architects are exposing more microarchitectural details directly to programmers. Software that exploits specialized accelerators, such as GPUs, and specialized processor features, such as software-controlled memory, exposes limitations in existing compiler and OS infrastructure. In this paper we propose a pragmatic approach, motivated by our […]
View View   Download Download (PDF)   
Shuai Che, Jie Li, Jeremy W. Sheaffer, Kevin Skadron, John Lach
Accelerators are special purpose processors designed to speed up compute-intensive sections of applications. Two extreme endpoints in the spectrum of possible accelerators are FPGAs and GPUs, which can often achieve better performance than CPUs on certain workloads. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth. Applications typically exhibit […]
V. Govindaraju, P. Djeu, K. Sankaralingam, M. Vernon, W. R. Mark
Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interactions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this improvement. This paper targets the entire graphics system stack and demonstrates algorithms, a software architecture, and a hardware architecture […]
View View   Download Download (PDF)   
Michael D. Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides […]
View View   Download Download (PDF)   
Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan, Pat Hanrahan
This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance […]
View View   Download Download (PDF)   
Page 1 of 212

* * *

* * *

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: