O3, Ruby: Forward invalidations from Ruby to O3 CPU
Review Request #894 - Created Oct. 17, 2011 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 8700:1c93580f459b --------------------------- O3, Ruby: Forward invalidations from Ruby to O3 CPU This patch implements the functionality for forwarding invalidations and replacements from the L1 cache of the Ruby memory system to the O3 CPU. The implementation adds a list of ports to RubyPort. Whenever a replacement or an invalidation is performed, the L1 cache forwards this to all the ports, which is the LSQ in case of the O3 CPU.
-
build_opts/ALPHA_SE_MESI_CMP_directory (Diff revision 1) -
Do you really want to remove support for all the other CPU models?
-
configs/ruby/MESI_CMP_directory.py (Diff revision 1) -
Hi Nilay, thanks for starting the work on this. I've been waffling on giving it a go for awhile now, so hopefully your push will get this rolling. Do you think that sending invalidations should be optional? For testing, this would be useful, but in general I would think you should always forward the invalidations to the CPU model and then the CPU model would choose to use that info or not (for instance, a inorder model may even buffer multiple speculative loads behind a blocking memory op)
-
src/mem/ruby/system/RubyPort.cc (Diff revision 1) -
Sanity Checks: (1) Is express snoop right here? That will make the snoop instantaneous right? Is that necessary if this just going directly to an L1? (2) Also, where does the meminhibit flag get deasserted?
-
src/mem/ruby/system/RubyPort.cc (Diff revision 1) -
I figured out that the packet may not be processed at this point in time. But may be scheduled for processing at a later time. Is it assured that the receiver will always delete the packet and request?
Thanks for the heads up on this patch. I'm glad you found the time to dive into it. I'm confused that the comment mentions a "list of ports", but I don't see a list of ports in the code and I'm not sure how would even be used? The two questions you pose are good ones. Hopefully someone who understands the O3 LSQ can answer the first, and I would suggest creating a new directed test that can manipulate the enqueue latency on the mandatory queue to create the necessary test situations. Also, I have a couple high-level comments right now: - Ruby doesn't implement any particular memory model. It just implements the cache coherence protocol, and more specifically invalidation based protocols. The protocol, in combination with the core model, results in the memory model. - I don't think it is sufficient to just forward those probes that hit valid copies to the O3 model. What about replacements of blocks that have serviced a speculative load? Instead, my thought would be to forward all probes to the O3 LSQ and think of cpu-controlled policies to filter out unecessary probes.
Summary: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
Diff: |
Revision 2 (+59 -13) |
Summary: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Diff: |
Revision 3 (+59 -13) |
Diff: |
Revision 4 (+59 -13) |
|---|
Diff: |
Revision 5 (+56 -14) |
|---|
Nilay, Thanks for pushing this patch along. This is a very important feature for gem5 and I'm glad we have you working on it. First, to answer your questions: - We can certainly avoid deadlock, but exactly how we do it depends on the interactions between the O3 CPU and Ruby. For the most part, it is up to the O3 model to avoid deadlock. I've heard through the grapevine that you are thinking about implementing the first, simplest option I suggested in my previous email. Essentially that is the one where the O3 model doesn't issue stores to Ruby until they reach the head of the store buffer. I think that is an excellent choice and it avoids having to worry about deadlock for stores since they are only issued to the memory system once they become non-speculative. In contrast, I'm sure the O3 model will issue speculative loads to Ruby and if the O3 CPU relies on speculative loads to succeed, we will encounter deadlock. However, as long as the O3 model eventually issues a load non-speculatively, I'm pretty sure we can guarantee forward progress. Make sense? - Testing at the CPU model is a great question. Do you know if the O3 model can read in a trace? If so, I would suggest a solution similar to the trace solution I suggested before to test Ruby. Basically you need the trace entries include a fixed delay so that you can enforce certain reorderings. I would use those fixed delay values to manipulate the delay in the mandatory queue. A couple questions/comments: - Why do you say that "My understanding is that this should ensure an SC execution, as long as Ruby can support SC. But I think Ruby does not support any memory model currently"? Ruby implements a cache coherence protocol, which is a component of a memory model, but in itself is not a memory model. Ruby can't alone support any particular memory model. However, I believe by forwarding probes and evictions to the CPU, Ruby can help support SC, TSO, or any other memory model. It is up to the CPU to act appropriately to achieve a certain model. - I would modify the action name "cc_squash_speculation" to something like "foward_eviction_to_cpu". It is really up to the CPU and memory model to determine whether speculation should be squashed. We should not try to imply that Ruby is designed to support a specific memory model or CPU type.
Diff: |
Revision 6 (+139 -40) |
|---|
Description: |
|
|---|
-
src/mem/protocol/MESI_CMP_directory-L1cache.sm (Diff revision 6) -
Change name to send_evictions
-
src/mem/protocol/MI_example-cache.sm (Diff revision 6) -
Same here
-
src/mem/protocol/MOESI_CMP_directory-L1cache.sm (Diff revision 6) -
and here
-
src/mem/protocol/MOESI_CMP_token-L1cache.sm (Diff revision 6) -
here
-
src/mem/protocol/MOESI_hammer-cache.sm (Diff revision 6) -
here
-
src/mem/protocol/RubySlicc_Types.sm (Diff revision 6) -
Change name to evictionCallback
-
src/mem/ruby/system/RubyPort.cc (Diff revision 6) -
ruby_eviction_callback
-
src/mem/ruby/system/Sequencer.hh (Diff revision 6) -
Again evictionCallback
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 7 (+139 -40) |
-
configs/example/se.py (Diff revision 7) -
Why are you changing this? This looks really wrong.
Description: |
|
|||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 8 (+131 -32) |
Just one minor question about the packet you create. Other than that, this looks good.
-
src/mem/ruby/system/RubyPort.cc (Diff revision 8) -
Should this use a different MemCmd then ReadExReq?
Diff: |
Revision 9 (+131 -32) |
|---|
Description: |
|
||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 10 (+131 -32) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 11 (+131 -32) |
