| ~ | | Fixes a cache latency calculation bug. |
| | ~ | Fixes a latency calculation bug for accesses during a cache line fill. |
| | |
|
| ~ | | The classic memory system can improperly calculate response latency under certain circumstances. When a cache miss to a lower level occurs, the response time to packets in the MSRH is computed by as follows: |
| | ~ | Under a cache miss, before the line is filled, accesses to the cache are associated with a MSHR and marked as targets. Once the line fill completes, MSHR target packets pay an additional latency of "responseLatency + busSerializationLatency". However, the "whenReady" field of the cache line is only set to an additional delay of "busSerializationLatency". This lacks the responseLatency component of the fill. It is possible for accesses that occur on the cycle of (or briefly after) the line fill to respond without properly paying the responseLatency. This also creates the situation where two accesses to the same address may be serviced in an order opposite of how they were received by the cache. For stores to the same address, this means that although the cache performs the stores in the order they were received, acknowledgements may be sent in a different order. |
| | |
|
| ~ | | completion_time = curTick() + responseLatency * clockPeriod() +
|
| | ~ | Adding the responseLatency component to the whenReady field preserves the penalty that should be paid and prevents these ordering issues. |
| - | | (transfer_offset ? pkt->busLastWordDelay :
|
| - | | pkt->busFirstWordDelay); |
| | |
|
| ~ | | However, the "whenReady" field of the cacheblock is only set to:
|
| | ~ | |
| - | | blk->whenReady = curTick() + pkt->busLastWordDelay; |
| | |
|
| ~ | | This expression lacks the responseLatency component. This means younger accesses that occur on (or after) the cycle of the fill from the lower level cache can be observed sooner than they should be. Additionally, and how I found this, it means multiple senior stores dispatched in order by the CPU to the same address may be acknowledged out of order. This can break some CPUs' LSQ implementations that assume stores issued in program order (to the same address) will be acknowledged in the order they were dispatched to the memory system. |
| | ~ | Note #1: It seems the non-LRU tags completely ignore the whenReady field. I assume these tag classes are currently non-functional.
|
| - | |
|
| - | | Note #1: It seems the non-LRU caches/tags completely ignore the whenReady field. Those will be fixed in a later patch.
|
| | | Note #2: I don't have commit access, someone else will have to push this. |