Opportunities for Nonvolatile Memory Systems in Extreme-Scale High Performance Computing
Oak Ridge National Laboratory, USA
Computing in Science and Engineering, March/April 2015
@article{ref60,
title={Opportunities for Nonvolatile Memory Systems in Extreme-Scale High Performance Computing},
year={2015},
author={Jeffrey S Vetter and Sparsh Mittal},
journal={Computing in Science and Engineering},
doi={10.1109/MCSE.2015.4},
url={https://www.academia.edu/9880029/Opportunities_for_Nonvolatile_Memory_Systems_in_Extreme-Scale_High_Performance_Computing},
keywords={nonvolatile memory, high performance computing, supercomputing, DRAM, ReRam, PCM, STT-RAM, Flash}
}
For extreme-scale high performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where the DRAM main memory systems account for about 30-50% of a node's overall power consumption. Moreover, as the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory capacities balanced with increasing computational rates offered by next-generation processors. However, a number of emerging memory technologies – nonvolatile memory (NVM) devices – are being investigated as an alternative for DRAM. Moving forward, these NVM devices may offer a number of solutions for HPC architectures. First, as the name, NVM, implies, these devices retain state without continuous power, which can, in turn, reduce power costs. Second, certain NVM devices can be as dense as DRAM, facilitating more memory capacity in the same physical volume. Finally, NVM, such as contemporary NAND flash memory, can be less expensive than DRAM in terms of cost per bit. Taken together, these benefits can provide opportunities for revolutionizing the design of extreme-scale HPC systems. Researchers are investigating how to integrate these emerging technologies into future extreme-scale HPC systems, and how to expose these capabilities in the software stack and applications. Current results show a number of these strategies may offer high-bandwidth I/O, larger main memory capacities, persistent data structures, and new approaches for application resilience and output post-processing, such as transaction-based, incremental-checkpointing and in-situ visualization, respectively.
January 21, 2015 by sparsh0mittal