Kevin K. Chang of Carnegie Mellon University published a brilliant and excellent doctoral thesis - "Understanding and Improving the Latency of DRAM-Based Memory Systems". Solved the DRAM problem and proposed some new architectural improvements to make substantial improvements in DRAM latency.

Three questions

Kevin divides the DRAM latency problem into four parts, and I will summarize three of them here:

· Inefficient batch data movement.

· DRAM refresh interference. When the DRAM is refreshing, it cannot all be accessed.

• Change in cell latency – due to manufacturing variability.

As for the fourth question: the impact of delay, interested friends can discuss together.

Kevin K. Chang: Solving DRAM problems, proposing new architectures to improve DRAM latency issues

1. Inefficient batch data movement

When memory and storage are expensive, data movement is limited to a block of register size, or at most 512-byte blocks from disk. But today, with storage capacity of gigabytes of storage and massive amounts of memory, massive data movement is becoming more common.

But the architecture of data movement—from memory to CPU to limited memory bus—has not changed. Chang's suggestion is a data path between a new type of high-bandwidth memory sub-array that uses several isolation transistors to create a wide -8192-bit wide-parallel bus between sub-arrays in the same memory.

2. DRAM refresh interference

The DRAM memory unit needs to be refreshed to hold the data, which is why it is called dynamic RAM. DRAM refreshes are queued, not one-time refreshes, because doing so requires too much power. However, when a queue is refreshed, it cannot be accessed, which causes a delay.

The latency of DRAM is getting higher, because as the chip density increases, more queues need to be updated, and 32Gb chips can reduce performance by nearly 20%.

Chang proposed two mechanisms to hide the refresh delay by parallel refreshing the memory accesses of banks and subarrays. One is to use an unordered per-bank refresh so that the memory controller can specify an idle bank to refresh instead of the regular strict loop order. The second strategy is parallelization of write-refresh operations, which overlaps the refresh delay and write latency.

In his tests, using an 8-core CPU, these strategies increased the weighted memory performance by more than 27%.

3. Cell delay variation

Thanks to the improvement of the manufacturing process, memory cells can have a large number of performance enhancements, increasing with increasing density. But DRAM is specified to run reliably at the slowest unit speed, which means that if the fastest cell is used, there will be a significant performance increase.

In the paper, Chang proposed two mechanisms to take advantage of this change, and the speed increase achieved increased from 13% to almost 20%.

Exploration and optimization

In the system architecture, there is no end to the task of exploring bottlenecks and repairing bottlenecks. In the past 20 years, DRAM was once considered difficult to improve, but we have seen that its level of delay will also be changed.

As more and more transistors, specialized instruction sets, and the like increase in performance, reducing DRAM latency will also be a major goal of performance improvement.

Pharmaceuticals

Pharmaceuticals,2-Methyl- Propanoic Acid Monohydrate Price,2-Methyl- Propanoic Acid Monohydrate Free Sample,Pure 2-Methyl- Propanoic Acid Monohydrate

Zhejiang Wild Wind Pharmaceutical Co., Ltd. , https://www.wild-windchem.com

Posted on