ruby: provide a backing store
Review Request #2466 - Created Oct. 21, 2014 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 10504:e0ff770ec065 --------------------------- ruby: provide a backing store Ruby's functional accesses are not guaranteed to succeed as of now. While this is not a problem for the protocols that are currently in the mainline repo, it seems that coherence protocols for gpus rely on a backing store to supply the correct data. The aim of this patch is to make this backing store configurable i.e. it comes into play only when a particular option: --access-backing-store is invoked. The backing store has been there since M5 and GEMS were integrated. The only difference is that earlier the system used to maintain the backing store and ruby's copy was write-only. Sometime last year, we moved to data being supplied supplied by ruby in SE mode simulations. And now we have patches on the reviewboard, which remove ruby's copy of memory altogether and rely completely on the system's memory to supply data. This patch adds back a SimpleMemory member to RubySystem. This member is used only if the option: access-backing-store is set to true. By default, the memory would not be accessed.
Issue Summary
5
1
4
0
| Description | From | Last Updated | Status |
|---|---|---|---|
| Why is this needed when the system already keeps track of all memory? | Andreas Hansson | Oct. 22, 2014, 3:31 p.m. | Open |
Posted (Oct. 22, 2014, 3:21 p.m.)
Thanks Nilay. I have couple comments below.
-
src/mem/ruby/system/RubyPort.cc (Diff revision 1) -
This comment should be updated.
-
src/mem/ruby/system/System.hh (Diff revision 1) -
Making this a static variable seems dangerous. With this change, one could not use a ruby protocol that needs a backing store (second copy of phys mem) in multi-system simulation. Can you make it non-static instead?
Posted (Oct. 22, 2014, 3:31 p.m.)
-
src/mem/ruby/system/RubyPort.cc (Diff revision 1) -
Why has this changed?
-
src/mem/ruby/system/RubyPort.cc (Diff revision 1) -
The flow control here seems strange, first turn it into a response, then pass it to the memory?
-
src/mem/ruby/system/RubySystem.py (Diff revision 1) -
Why is this needed when the system already keeps track of all memory?
I do not understand what the patch is doing, and why we do not rely on doing things as before with the System being responsible for the memories and the backing stores. This memory does not even have to be connected to anything in the physical sense. Could you provide some background as to the code, and why the code that is there at the moment does not fit the bill?
Posted (Oct. 22, 2014, 8:49 p.m.)
I'd like to weigh in on this. I, too, am confused about the need for this change and would be very conflicted about changing/removing Ruby's functionally-coherent store:
tl;dr: For everyone's sake, I describe (my understanding of) the uses of Ruby's functionally-coherent backing store. Then explain why I feel that changing/removing it fails to make progress toward addressing the real issues with Ruby functional accesses.
I haven't seen a thorough explanation of this anywhere despite these review requests proposing substantial changes, so here's a write-up of my understanding of the Ruby functionally-coherent store:
First, so we're on the same page: Ruby can maintain different versions of data in the caches, and the coherence/validity of that data depends on the data's state and the coherence protocol's interpretation of that state. The states and their implied data validity can be very different across different coherence protocols, as can the use of cache controller queues and interconnect message buffers for managing state and data. Ruby's functionally-coherent data store (what's on the chopping block here) decouples the state+transition design of a protocol from the data handling by providing a place to store the known current (coherent) version of the data that is readily accessible from any controller.
The functionally-coherent store is very useful for developing new coherence protocols: When developing, the protocol author often aims to get the apparent states and transitions correct, but the data handling part can be a bit messier. For example, something as simple as forgetting to copy data from a cache block to an MSHR in a protocol transition makes it tricky to debug where incorrect data may have arisen. For this reason, it is useful to have an always functionally-coherent copy of the data that allows the developer to just grab the right data at the end of a memory access. This functionality is currently available by setting the RubyPort/Sequencer access_phys_mem parameter to True. In src/mem/ruby/system/RubyPort.cc, the RubyPort::MemSlavePort::hitCallback() function is called at the end of every memory access and decides based on access_phys_mem whether to read data from the functionally-coherent store. This functionality effectively decouples the development of a protocol's data correctness from the it's state-handling correctness.
While it is useful in protocol development, the functionally-coherent store can actually provide MUCH more (I think): truly functional memory access, which Ruby doesn't fully support currently. gem5 is built following a principle that it should support functional memory accesses as a way to allow programmers to functionally implement interesting new simulator behaviors before deciding whether they should be implemented more completely. Ruby's functionally-coherent store can provide this functionality by ensuring that, regardless of the state of data in the Ruby caches AND regardless of the coherence protocol, we can always access the coherent version of the data when access_phys_mem = True for all requesting RubyPorts (at this point, I'll note that for gem5-gpu, we maintain a tiny Ruby patch to get this to appear to work completely and I've heard through the grapevine that AMD has done something similar).
So, what I'm saying is that Ruby doesn't actually fully support functional memory accesses (but can with the backing store)... Ruby functional memory access support would be hard to implement, so no one has taken the plunge. Due to the variability in what various coherence protocols try to implement, it is often difficult to decide how to handle functional data accesses. For example, when data is being moved between caches (i.e. it resides in MSHRs, controller queues, or the interconnect), it is not always obvious where the current (coherent) version of the data resides. In such a situation, Ruby currently doesn't have a way to guarantee correctness of the functional access, so it gives up and exits simulation with a fatal() in RubyPort::MemSlavePort::recvFunctional() (this sucks). The code that tries to figure this out is in the Ruby system functionalRead or functionalWrite functions, which perform heavy-duty look-ups for the data across all the cache controllers, and these look-ups can fail to return/update functionally-coherent data under many different conditions.
It's worth noting that since gem5's syscall emulation uses functional accesses in a few cases, anyone using Ruby with SE mode and those system calls is actually getting lucky by not running into the functional access fatal().
My (our) stake in Ruby functional accesses:
This is largely informed by my experience with functional memory accesses for rapid development of gem5-gpu. In gem5-gpu, we rely fairly heavily on functional accesses as a way to implement a slim and flexible CUDA runtime library. For example, when a CUDA benchmark starts executing, we functionally read the binary out of the simulated system's memory to hand over to GPGPU-Sim, which needs the GPU code to simulate the GPU cores. In the substantial majority of cases, these sorts of functional accesses don't trigger Ruby's functional access fatal(), because they are situations that Ruby can handle (e.g. reading from cache data in a shared state or from off-chip memory). However, we inevitably run into the functional access fatal() here-and-there, just because we're testing so many different things. Also, while we could eliminate a fair number of functional accesses, there are a few places in our CUDA runtime that would still require functional accesses. Finally, I am aware of at least 3 different Ruby coherence protocols that rely on the functionally-coherent backing store: the primary protocol included with gem5-gpu ("VI_hammer"), and 2 others that have been developed by gem5-gpu users (note: gem5-gpu recently passed 100 total mailing list subscribers and 300+ downloads - Woohoo!).
My take:
Sure, I feel that it's very important for Ruby to completely support functional accesses going forward, which would suggest that we could eliminate the functionally-coherent store. However, I also feel that the ability to use a functionally-coherent backing store MUST stay. I suspect it will be very desirable to implement thin runtime interfaces for accelerators, and such runtimes are likely to use functional accesses. With the potential desire to test new coherence protocols to join these heterogeneous cores, it is likely that Ruby will be involved and need to handle the functional accesses. This suggests we should invest some effort in making Ruby functional accesses more robust.
Even if Ruby completely supports functional accesses, a coherence protocol developer should NOT be required to get data handling correct when trying to implement or hack on a protocol. It's hard enough to get the protocol states and transitions correct while making sure you're not inadvertently introducing race conditions. So, I feel that the functionally-coherent backing store should remain at least as an optional feature to ease the developer's process.
I don't feel that this patch or the proposal to remove the backing store are particularly mindful of either of these issues.
Review request changed
Updated (Oct. 23, 2014, 9:03 a.m.)
Summary: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Diff: |
Revision 2 (+28 -22) |
