15046
Sparsh Mittal
Process variation –deviation in parameters from their nominal specifications– threatens to slow down and even pause technological scaling and mitigation of it is the way to continue the benefits of chip miniaturization. In this paper, we present a survey of architectural techniques for managing process variation (PV) in modern processors. We also classify these techniques […]
View View   Download Download (PDF)   
Josue Vladimir Quiroga Esparza
Heterogeneous systems, more specifically CPU – GPGPU platforms, have gained a lot of attention due to the excellent speedups GPUs can achieve with such little amount of energy consumption. Anyhow, not everything is such a good story, the complex programming models to get the maximum exploitation of the devices and data movement overheads are some […]
View View   Download Download (PDF)   
Ang Li, Gert-Jan van den Braak, Akash Kumar, Henk Corporaal
In the last decade, GPUs have emerged to be widely adopted for general-purpose applications. To capture on-chip locality for these applications, modern GPUs have integrated multilevel cache hierarchy, in an attempt to reduce the amount and latency of the massive and sometimes irregular memory accesses. However, inferior performance is frequently attained due to serious congestion […]
View View   Download Download (PDF)   
Sparsh Mittal
Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. Low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also […]
View View   Download Download (PDF)   
Sparsh Mittal, Jeffrey Vetter
Recent trends of increasing core-count and memory/bandwidth-wall have led to major overhauls in chip architecture. In face of increasing cache capacity demands, researchers have now explored DRAM, which was conventionally considered synonymous to main memory, for designing large last level caches. Efficient integration of DRAM caches in mainstream computing systems, however, also presents several challenges […]
View View   Download Download (PDF)   
Sparsh Mittal and Jeffrey S. Vetter
Non-volatile memory (NVM) devices, such as Flash, phase change RAM, spin transfer torque RAM, and resistive RAM, offer several advantages and challenges when compared to conventional memory technologies, such as DRAM and magnetic hard disk drives (HDDs). In this paper, we present a survey of software techniques that have been proposed to exploit the advantages […]
View View   Download Download (PDF)   
Andrew A. Haigh, Eric C. McCreath
Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in […]
View View   Download Download (PDF)   
Sparsh Mittal, Matt Poremba, Jeffrey Vetter, Yuan Xie
To enable the design of large sized caches, novel memory technologies (such as non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) have been explored. The existing modeling tools, however, cover only few memory technologies, CMOS technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 3D (and 2D) cache designs using SRAM, […]
Sparsh Mittal
The demands of larger memory capacity in high-performance computing systems have motivated the researchers to explore alternatives of DRAM (dynamic random access memory). Since PCM (phase change memory) provides high-density, good scalability and non-volatile data storage, it has received significant amount of attention in recent years. A crucial bottleneck in wide-spread adoption of PCM, however, […]
View View   Download Download (PDF)   
Naznin Fauzia, Louis-Noel Pouchet, P. Sadayappan
Effective parallel programming for GPUs requires careful attention to several factors, including ensuring coalesced access of data from global memory. There is a need for tools that can provide feedback to users about statements in a GPU kernel where non-coalesced data access occurs, and assistance in fixing the problem. In this paper, we address both […]
View View   Download Download (PDF)   
Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, John Wickerson
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent […]
View View   Download Download (PDF)   
Jeffrey S. Vetter, Sparsh Mittal
For extreme-scale high performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where the DRAM main memory systems account for about 30-50% of a node's overall power consumption. Moreover, as the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory […]
View View   Download Download (PDF)   
Page 1 of 812345...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1748 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: