ruby: Fix block_on behavior
Review Request #3149 - Created Oct. 9, 2015 and submitted
| Information | |
|---|---|
| Joel Hestness | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 11169:0a83a8d08c9e --------------------------- ruby: Fix block_on behavior Ruby's controller block_on behavior aimed to block MessageBuffer requests into SLICC controllers when a Locked_RMW was in flight. Unfortunately, this functionality only partially works: When non-Locked_RMW memory accesses are issued to the sequencer to an address with an in-flight Locked_RMW, the sequencer may pass those accesses through to the controller. At the controller, a number of incorrect activities can occur depending on the protocol. In MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2 transfer, which cannot be serviced, because the block_on functionality blocks the trigger queue, resulting in a deadlock. Further, if an intermediate store arrives (e.g. from a separate SMT thread), the sequencer allows the request through to the controller, and the atomicity of the Locked_RMW may be broken. To avoid these problems, disallow the Sequencer from passing any memory accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in- flight.
Considered many other potential solutions on gem5-gpu email list, which seem
unlikely to function as desired:
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7gFound reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
-
src/mem/ruby/system/Sequencer.cc (Diff revision 1) -
I prefer this check being done just before insertRequest() is called.
-
src/mem/ruby/system/Sequencer.cc (Diff revision 1) -
This is the place I am asking the check be moved to.
-
src/mem/ruby/system/Sequencer.cc (Diff revision 1) -
There's no reason that I can see why this should be a std::map. The mandatory_q_ptr that is passed in is not used by anything at all. (The only checks now are on the address and there is no action taken on the mandatory_q. Maybe I am missing something?) Can we change the tracking structure to reflect that this is really a vector of addresses for the memory controller that we should be checking instead of a map? Actually, there's only a single address with this new modification since the request will be nack'd back to the requester.
Change Summary:
Move default_entry creation after check per Brandon's suggestion.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+17) |
Ship It!
-
src/mem/ruby/system/Sequencer.cc (Diff revision 2) -
Can you add a comment to the code here (or somewhere in this function) that describes how the RequestStatus_Aliased actually fixes the problem.
I looked through the code and eventually figured it out, but it took some time and was a bit convoluted.
This return goes back up into makeRequest with Aliased which eventually becomes visible in RubyPort::MemSlavePort::recvTimingResp and it re-enqueues the port for a retry and returns false to the owner of the port (CPU) which denotes that it failed. The (owner) CPU is expected to wait for the RubyPort to trigger the retry (which happens when the port because available again when RubyPort executes RubyPort::ruby_hit_callback).
The confusing thing was the the O3CPU model has seperate ports for instruction and data which is what allows this to work with the IFETCH. The MemSlavePort corresponding to the i-cache is what gets stalled, but the MemSlavePort associated with the d-cache is still open and allows the RMW_Write to proceed.
I don't know what a good summary would look like, but it would be nice if there was something here to hint at what's happening.
-
src/mem/ruby/system/Sequencer.cc (Diff revision 2) -
Can you add a comment here that the 1st level controller is blocked for all message buffers except the mandatory queue?
The name of the function, blockOnQueue, can be misleading depending on how the reader interprets it; one might just glance at this code and think that the mandatory queue gets blocked when in reality it's all of the other message buffers.
Thank you Joel for fixing this bug and posting this patch. Those of us at AMD have encountered this bug as well and some of us are already using a version of this patch. It would be great if you could check this in soon so that we do not have to maintain your fix locally. Please check in this patch as soon as you can. Thanks!
