ruby: Remove the RubyCache/CacheMemory latency
Review Request #2841 - Created May 21, 2015 and submitted
| Information | |
|---|---|
| Joel Hestness | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 10933:cf3a413d2c38 --------------------------- ruby: Remove the RubyCache/CacheMemory latency The RubyCache (CacheMemory) latency parameter is only used for top-level caches instantiated for Ruby coherence protocols. However, the top-level cache hit latency is assessed by the Sequencer as accesses flow through to the cache hierarchy. Further, protocol state machines should be enforcing these cache hit latencies, but RubyCaches do not expose their latency to any existng state machines through the SLICC/C++ interface. Thus, the RubyCache latency parameter is superfluous for all caches. This is confusing for users. As a step toward pushing L0/L1 cache hit latency into the top-level cache controllers, move their latencies out of the RubyCache declarations and over to their Sequencers. Eventually, these Sequencer parameters should be exposed as parameters to the top-level cache controllers, which should assess the latency. NOTE: Assessing these latencies in the cache controllers will require modifying each to eliminate instantaneous Ruby hit callbacks in transitions that finish accesses, which is likely a large undertaking.
Small tests with all different protocols to verify appropriate performance
changes.Please consider this patch as a substitute for http://reviews.gem5.org/r/2796/
I am not familiar with the Ruby caches, but on the classic side of things we have recently added cache parameters, for misses, forwarding, snopping etc. I am mostly surprised to see that Ruby only has a single parameter for the cache latency, and now even that one is being removed/moved. Is there not a case for keeping them similar and encapsulate the cache-related timings by leaving them as a responsibility of the cache?
Have you looked at the implications of cache hit latencies greater than 1? In our experience, a hit latency greater than 1 causes consistent pipeline bubbles and significant performance degradation. The O3 model does not perform well and you do not achieve real system behavior.
Since you went through the effort to remove the latency parameter, let's completely remove it rather than add new L1 icache and dcache hit latency parameters. Hardcode the mandatory queue enqueue latency to 1. That is effectively what our internal memory systems do.
-
src/mem/ruby/system/Sequencer.py (Diff revision 1) -
Can you give these parameters a default of 1 and add a comment that says "setting these values to a value greater than one will result in pipeline bubbles and negatively impact O3 performance"?
Change Summary:
Added a default latency of 1 per Brad's suggestion, and removed the latency setting from protocol config files (i.e. closer to migrating these latencies completely into controllers).
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Testing Done: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Diff: |
Revision 2 (+53 -71) |
-
src/mem/ruby/system/Sequencer.cc (Diff revision 2) -
Redundant. Cycles is an unsigned type
-
src/mem/ruby/system/Sequencer.cc (Diff revision 2) -
response?
No need to change anything, but surely there is more to it than response.
In the classic cache we have forward latency, lookup latency, response latency. How does Ruby differentiate the latency from request -> response, request -> request, snoop request -> snoop response, snoop request -> snoop request, and response -> response, snoop response -> snoop response? Perhaps something to think more about?
This looks great to me. Please check this in after our check in on 7/31.
