Sparsh Mittal, Matt Poremba, Jeffrey Vetter, Yuan Xie
To enable the design of large sized caches, novel memory technologies (such as non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) have been explored. The existing modeling tools, however, cover only few memory technologies, CMOS technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 3D (and 2D) cache designs using SRAM, […]
Sparsh Mittal
The demands of larger memory capacity in high-performance computing systems have motivated the researchers to explore alternatives of DRAM (dynamic random access memory). Since PCM (phase change memory) provides high-density, good scalability and non-volatile data storage, it has received significant amount of attention in recent years. A crucial bottleneck in wide-spread adoption of PCM, however, […]
View View   Download Download (PDF)   
Naznin Fauzia, Louis-Noel Pouchet, P. Sadayappan
Effective parallel programming for GPUs requires careful attention to several factors, including ensuring coalesced access of data from global memory. There is a need for tools that can provide feedback to users about statements in a GPU kernel where non-coalesced data access occurs, and assistance in fixing the problem. In this paper, we address both […]
View View   Download Download (PDF)   
Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, John Wickerson
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent […]
View View   Download Download (PDF)   
Jeffrey S. Vetter, Sparsh Mittal
For extreme-scale high performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where the DRAM main memory systems account for about 30-50% of a node's overall power consumption. Moreover, as the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory […]
View View   Download Download (PDF)   
Sparsh Mittal
Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers […]
View View   Download Download (PDF)   
Numair Khan
With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying […]
View View   Download Download (PDF)   
Sudipta Chattopadhyay, Petru Eles, Zebo Peng
Embedded and real-time software is often constrained by several temporal requirements. Therefore, it is important to design embedded software that meets the required performance goal. The inception of embedded graphics processing units (GPUs) brings fresh hope in developing high-performance embedded software which were previously not suitable for embedded platforms. Whereas GPUs use massive parallelism to […]
View View   Download Download (PDF)   
Xinxin Mei, Kaiyong Zhao, Chengjian Liu, Xiaowen Chu
Memory access efficiency is a key factor for fully exploiting the computational power of Graphics Processing Units (GPUs). However, many details of the GPU memory hierarchy are not released by the vendors. We propose a novel fine-grained benchmarking approach and apply it on two popular GPUs, namely Fermi and Kepler, to expose the previously unknown […]
Mustafa Ali, Tarik Ozkul
Traditional ways of increasing computer performance has been increasing speed and bit size. Although this kept us going for more than half of a century, methodology has hit a major road block due to power consumption and heat dissipation. The remedy found for this problem has been creating multiple cores/heterogeneous computing systems on a single […]
View View   Download Download (PDF)   
Bharath Subramanian Pichai
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. Indeed, processor vendors have already begun embracing heterogeneous systems with […]
View View   Download Download (PDF)   
R. Mokhtari, M. Stumm
GPUs offer an order of magnitude higher compute power and memory bandwidth than CPUs. GPUs therefore might appear to be well suited to accelerate computations that operate on voluminous data sets in independent ways; e.g., for transformations, filtering, aggregation, partitioning or other ”Big Data” style processing. Yet experience indicates that it is difficult, and often […]
View View   Download Download (PDF)   
Page 1 of 712345...Last »

* * *

* * *

Like us on Facebook

HGPU group

218 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1400 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: