Download Architectural Techniques to Enable Reliable and High Performance Memory Hierarchy in Chip Multi-processors Book in PDF, ePub and Kindle
Constant technology scaling has enabled modern computing systems to achieve high degrees of thread-level parallelism, making the design of a highly scalable and dense memory hierarchy a major challenge. During the past few decades SRAM has been widely used as the dominant technology to build on-chip cache hierarchies. On the other hand, for the main memory, DRAM has been exploited to satisfy the applications demand. However, both of these two technologies face serious scalability and power consumption problems. While there has been enormous research work to address the drawbacks of these technologies, researchers have also been considering non-volatile memory technologies to replace SRAM and DRAM in future processors. Among dierent non-volatile technologies, Spin-Transfer Torque RAM (STT-RAM) and Phase Change Memory (PCM) are the most promising candidates to replace SRAM and DRAM technologies, respectively. Researchers believe that the memory hierarchy in future computing systems will consist of a hybrid combination of current technologies (i.e., SRAM and DRAM) and non-volatile technologies (e.g., STT-RAM, and PCM). While each of these technologies have their own unique features, they have some specic limitations as well. Therefore, in order to achieve a memory hierarchy that satises all the system-level requirements, we need to study each of these memory technologies.In this dissertation, the author proposes several mechanisms to address some of the major issues with each of these technologies. To relieve the wear-out problem in a PCM-based main memory, a compression-based platform is proposed, where the compression scheme collaborates with wear-leveling and error correction schemes to further extend the memory lifetime. On the other hand, to mitigate the write disturbance problem in PCM, a new write strategy as well as a non-overlapping data layout is proposed to manage the thermal disturbance among adjacent cells.For the on-chip cache, however, we would like to achieve a scalable low-latency conguration. To this end, the author proposes a morphable SLC-MLC STT-RAM cache which dynamically trade-os between larger capacity and lower latency, based on the applications demand. While adopting scalable memory technologies, such as STT-RAM, improves the performance of cache-sensitive applications, the cache thrashing problem will stil exist in applications with very large data working-set. To address this issue, the author proposes a selective caching mechanism for highly parallel architectures. And, also introduces a criticality-aware compressed last-level cache which is capable of holding a larger portion of the data working-set while the access latency is kept low.