Review Board 2.0.15


mem: MSHR livelock bug fix

Review Request #2784 - Created May 11, 2015 and updated

Information
Tony Gutierrez
gem5
default
Reviewers
Default

Changeset 10841:b015addd7b9d
---------------------------
mem: Ruby response timing

This patch ensures that Ruby responses to the CPU core are not unnecessarily
delayed. The original code delays Ruby responses by a tick, causing the core
to receive them a cycle later, rather than in the same cycle. Hence, the
throughput of back-to-back stores that hit in the L1 are reduced by
half because the O3 must wait for the acknowledgement of a prior store before
issuing the next store. This patch eliminates the performance bug.

This patch was created by Bihn Pham during his internship at AMD.


   
Review request changed
Updated (July 31, 2015, 8:17 a.m.)

Description:

~  

Changeset 10841:b015addd7b9d

  ~

Changeset 10841:b015addd7b9d

  + ---------------------------
  + mem: Ruby response timing

   
~  

mem: MSHR livelock bug fix

  ~

This patch ensures that Ruby responses to the CPU core are not unnecessarily

  + delayed. The original code delays Ruby responses by a tick, causing the core
  + to receive them a cycle later, rather than in the same cycle. Hence, the
  + throughput of back-to-back stores that hit in the L1 are reduced by
  + half because the O3 must wait for the acknowledgement of a prior store before
  + issuing the next store. This patch eliminates the performance bug.

   
   

This patch was created by Bihn Pham during his internship at AMD.

-  
-  

This bug fix prevents a case in which a prefetcher uses up all remaining MSHR

-   entries before demand requests get a chance to, causing a livelock.
-   This happens because events scheduled at curTick() + 1 are evaluated in the
-   next cycle, not in the current cycle.

-  
-  

A specific case that caused this livelock situation is the following:

-   There are back-to-back stores and the second store cannot be sent to the cache
-   until the first store receives an ACK. When the ACK is scheduled at curTick() +
-   1, meaning that the ACK is to be sent in the next cycle, there is an open MSHR
-   entry in the current cycle. A prefetcher grabs the entry by issuing a prefetch
-   request in the current cycle before the second store gets a chance to issue in
-   the next cycle. The second store stalls because the MSHR is already full by
-   that time.

Posted (July 31, 2015, 9:37 a.m.)

This is dangerous.

The whole idea is that we do not send things in 0 time (infinite throughput). Admittedly the +1 is a poor-mans version of a delta-delay, but I fear this interacts with a lot of things. What is the impact on (classic) cache performance, the other CPUs etc?