Review Board 2.0.15


ruby: set default latency for ruby caches

Review Request #2796 - Created May 11, 2015 and updated

Information
Tony Gutierrez
gem5
default
Reviewers
Default
Changeset 10853:1ce6a4401f97
---------------------------
ruby: set default latency for ruby caches

Set to 0 since many protocols do not use the parameter.

   
Ship it!
Posted (May 12, 2015, 9:14 a.m.)
Ship It!
Posted (May 12, 2015, 9:22 a.m.)

This change seems to go away from its apparent aim:

First, this patch seems to be aimed at correcting some common confusion: The RubyCache latency parameter is strange, because the cache itself does not enforce the latency. Instead, the protocol controllers enforce cache access latencies, which can be defined either as part of the cache or as parameters to the controller. This current organization disagregates the accountability for enforcing reasonable cache read/write latency, and makes it very confusing/difficult to validate memory access times.

However, just defaulting the RubyCache latency to 0 doesn't fix any confusion about protocol memory access time validation, and actually encourages poorer use of the latency parameter. Right now, when I open any configs/ruby/*.py protocol file, I see a latency declaration in a cache with a comment that says the latency is ignored (except sometimes for fast path hits...?). Why declare a cache latency if it won't be used?! This patch could eliminate the need to forward declare cache latencies, but the cache declaration itself must remain. This means that this patch would leave numerous cache declarations with confusing, unused latency declarations even though cache latencies now default to 0 cycles (which - I suppose - could be used to indicate that the latency is not used, but again, what should happen if I change the latency?). Worse, new protocols might not even define a cache's unused latencies, leaving the user baffled about why they can change a cache's latency without it affecting simulated system performance.

We should be aiming to either use or lose the RubyCache latency parameter, but not settle somewhere between. Here are the two extremes:
1) All RubyCache latencies must be defined and enforced by their respective protocols. In config files, this would involve passing the RubyCache's latency variable as a parameter to the controllers that need them. In the protocol files, controllers could also enforce these latencies by referencing the cache's latency during enqueue to a buffer. This would actually make memory access times depend on the cache latency parameter (though in practice, it's hard to make sure protocols do this).

2) Get rid of the RubyCache latency parameter altogether. This would require protocols to parameterize any cache latencies (most already do, though sometimes aggregated with other latencies, which is cumbersome and very fragile). This would side-step user confusion about the RubyCache latency variable.

The former option is probably better: We've added tag and data array resource conflicts to the RubyCache, so it is partially responsible for enforcing latencies already. Further, it would be easier to find and change a cache's latency rather than digging through a controller file to see how latencies are used and setting some aggregated parameter. Finally, the RubyCache's latency is saved to config.ini, so it would be easier to reference prior simulations for comparison and validation.

  1. Joel, I understand your concern, but I believe this is at least a step in the right direction. In the end, I believe that the second option you describe is where we want to be. In real systems, cache latency is more complicated than a simple single parameter. This patch at least eliminates the latency parameter from those protocols that want to be smarter. A later patch can remove it altogether.

  2. I feel strongly that this patch should not be committed. It doesn't offer any new functionality, and worse, it encourages protocol authors to obfuscate RubyCache latencies by not requiring them to explicitly think about them during cache declarations. At least with the code as-is, we end up with explicit comments in the protocol configs about latencies not being used. I disagree that a default latency is a smarter move.

    This problem needs to be fixed in a more robust way.

  3. This is a simple one-line (actually just a few character) change that allows new protocols to be better. Is it really worth a strong objection because it doesn't fix the latency problems in older protocols? In the amount of effort you put into pushing back on this change, could have been spent on fixing the older protocols.

  4. I've spent many hours performance validating coherence protocols, and the most painful problems are always obfuscated latencies. So yes, I feel this is a significant problem worth spending time on.

    To move this review forward, I tried using this patch, and it doesn't really work: After issuing a memory access to a Ruby sequencer, it pushes the request into the cache controller's mandatory queue with this latency defined in CacheMemory. If you set an L1 cache's latency to 0, the Sequencer fails a non-zero latency assertion before the enqueue in Sequencer::issueRequest(). Thus, a protocol author needs to explicitly set cache latencies anyway. With this patch they would receive an obtuse assertion failure. Again, at least with the code as-is, instantiating a RubyCache in Python will signal that the latency must be set.

    Anyway, I've started looking into the right way to fix this, and I'll put together a patch. This review request should not be committed though.

Posted (May 12, 2015, 1:57 p.m.)

What's the reason for not removing the latency parameter altogether in this patch? From what I can tell it's used in exactly one place: the parameter is taken from the L1 cache (or whatever has the mandatory queue) and used to add latency when the memory request is initially injected into Ruby. If you are going to default to 0, I see no reason not to just remove the confusion altogether.

As far as I know, no protocol uses this parameter (correctly). Am I mistaken here?

I really hate changes that force all of the protocol-specific python config files to change, but in this case I think it's worth it.

  1. I cannot speak for all protocols, so I'd prefer not to make that change. If you and Joel are motivated to get rid of it, then please go ahead and post that patch.

Ship it!
Posted (May 13, 2015, 7:38 a.m.)
Ship It!