Changes to the gem5 memory-system (release-0.2)
Review Request #817 - Created Aug. 5, 2011 and discarded
| Information | |
|---|---|
| Andreas Hansson | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
| ali, gblack, nate, stever | |
Changes to the gem5 memory-system (release-0.2)
------------------------------------------------------------------------------
What is the goal
The goal is to make it easier to use gem5 for communication-centric
modelling by adopting a communication framework similar to that of the
TLM-2.0 transaction level modelling standard from the Open SystemC
Initiative (OSCI). Just like TLM-2.0, the basic idea behind the
changes is to facilitate modelling of inter-module communication
through a set of well-define module interfaces, e.g. a memory-mapped
interface, and a cache-maintenance interface.
The major difference with release 0.1 of the extensions is that all
modules now implement the 4-phase handshakes. Thus, this serves as a
good indication of what the modules will look and feel like.
After this submission we are starting to put incremental patches in
place to get this into the main repository. In other words, the sooner
we get feedback the better. Once again thanks for the comments on the
first round.
------------------------------------------------------------------------------
What are the key improvements
- Master and slave interfaces (SystemC sc_export), distinguishing the
different roles in inter-module communication. A master (TLM-2.0
initiator) is a MemObject that initiates new transactions, and a
slave (TLM2 target) is a module that responds to transactions
initiated by master modules. The same module can act both as a
master and a slave, and this would typically be the case for a model
of a bridge, a router, or a bus. See src/mem/module_interface.hh
- Master and slave port interfaces that are used to access the
corresponding module interfaces (SystemC sc_port). Together with the
module interfaces the port interfaces form the basis for having
ports and modules with different protocols. For example, a
memory-mapped master port would be used to call functions
implemented by a memory-mapped slave interface. See
src/mem/port_interface.{hh,cc}
- Ports that only convey structural information and leave the syntax
(payload type) and semantics to a specific port interface. The
ports themselves only contain information about their owners
(structurally) and basic knowledge of their connectivity. The actual
semantics and syntax of the communication between a master and a
slave port is determined by their port interfaces and the
corresponding exported module interfaces. See src/mem/port.{hh,cc}
- The specific type of packet is determined by the protocol of the
interfaces enabling diversity in payload types. See
src/mem/protocol.{hh,cc} and src/mem/packet.{hh,cc}
- Standard 4-phase handshakes (TLM-2.0 approximately timed) for
request/response avoid complex receive/retry interaction between
modules in timing mode. Every MemObject is not using the new 4-phase
semantics.
- Ports do not implement any module-specific functionality but merely
calls functions on their interface classes.
------------------------------------------------------------------------------
How does it work
A master port has a protocol-specific master-port interface, and is
associated with a master-module interface. An example of this is the
data and instruction port of a CPU. The master-port interface calls
functions on the connected slave-module interface. In the case of the
CPU, this could be e.g. memory, or a cache, both implementing the
memory-mapped slave interface and making it visible through a
memory-mapped slave port. The Python instantiation is currently based
on the position of the ports (left or right of the equality sign) to
determine if they are masters or slaves, but this should be eventually
be specified in the Module.py using subclasses of Port. Similarly, the
cache ports are currently determined by looking at the name of the
port. This should also be extended to use an enum or literal in the
Module.py. The binding is checked at instantiation time when the C++
objects are created and their ports connected. As part of the
instantiation a structural diagram of the system is also created as a
Graphviz dot file. Run it through "dot -Tsvg" and have a look in your
browser. The diagram clearly shows what ports are connected, what
protocol and role they have, and as a tooltip if also shows the
address ranges of all memory-mapped slave ports.
Currently the 4-phase handshake are used by all the MemObjects in the
system. However, the bridges are still present to facilitate the
integration work. See src/mem/bridge_classic_to_4phase.{hh,cc} and
src/mem/bridge_4phase_to_classic.{hh,cc}. The 4-phase is completely
replacing the old receive/retry handshakes, but there are situations
with unaligned accesses that still have to be addressed (we only
looked at Alpha and so got away without worrying).
Similar to the classic gem5 memory-system, a packet points to a
request and its associated data. In the typical case, a memory-mapped
request packet is created by a master, such as a DMA or a CPU. Once
beginReq is called, the sending (master) module should not change the
packet until its response is returned through a beginResp (where
applicable). An intermediate component, such as a bus or cache, may
create forwarded cache-maintenance packets from the original
memory-mapped request. Thus, one request gives rise to a chain of
requests and responses. Currently the lifetime and rules governing
those packets (and their request and data pointers) is work in
progress (see e.g. coherent_bus.cc). The original request/response may
be deallocated before the snoop request/response and vice versa. Smart
pointers and reference counting might be a viable solution, or
alternatively a more intelligent snoop controller in the
buses. In addition, the different packets (currently memory-mapped
and cache-maintenance) should be stripped of as much common
functionality as possible, reducing the memory-mapped packets to a
bare essential.
In response to questions from the reviews of release 0.1, the 4-phase
does not mandate a response handshake (it is possible to have only a
request begin/end). Also note that the untimed cache maintenance
protocol does not use the 4-phase at all. Thus, if there is need for
something else it is possible.
------------------------------------------------------------------------------
What is the intention with this patch
This patch is not showing all the changes made to the repository, but
in contrast to the release 0.1 this includes essentially all source
changes, and also diffs with respect to the revision the day we
branched (22 February 2011).
- The underlying infrastructure
o Module Interface (what does a master/slave have to provide)
o Port Interface (how is it accesses through the ports)
o Port (how are the structural ports and logical interfaces bound together)
o Protocol
- The basic building blocks
o MemObject (maintain collections of ports and do look ups of names)
o Packet Queue (similar to the Payload Event Queue in TLM2, and
closely related to the SimpleTimingPort)
- The models themselves
o NonCoherentBus, CoherentBus
o I/O Device (show a simple memory-mapped slave)
o PhysicalMemory (same as above)
o Bridge (show the benefits of the protocol separation and clear port roles)
o CPU (demonstrate the shift from functionality in the ports to
interfaces of the modules)
o Bridge classic to/from 4-phase (highlighting the difference between
the semantics)
- An example of their use
o Tsunami system (show the port connections and the structure)
o Caches
o CPUs
The goal is to get feedback and suggestions on anything from the
actual design and how it is implemented, to the coding style and code
comments. This is also an opportunity for everyone to influence and
steer the changes and the integration into the main gem5
repository. With this second review we also hope to share the
remaining trajectory for integration into the repository, chopping the
contributions up in incremental patches. This work is about to start,
so let us know as soon as possible if you have questions or concerns.
------------------------------------------------------------------------------
Testing and verification
In order to work effectively we have limited the regression to only
include the quick Alpha tests. For these tests, the appropriate
updates have been made to connect the additional ports, and define the
role and protocol for the ports in question. Due to the changes in
timing, small deviations (plus minus a few percent) in statistics have
been observed for a number of tests. We have considered this to be
within reasonable limits and updated the reference behaviour.
Posted (Aug. 27, 2011, 3:37 p.m.)
-
src/cpu/o3/cpu.cc (Diff revision 1) -
Probably be better to inline the constructors and description methods in the class declaration to save retyping the class names. I probably wouldn't object to putting the process() definitions there too since they're short.
-
src/cpu/simple/timing.cc (Diff revision 1) -
I don't really like having to identify which port something is coming in on by matching on the numeric ID... the old model with the embedded Port subclasses avoided this, and it would be nice not to lose that.
-
src/dev/alpha/Tsunami.py (Diff revision 1) -
Does the LHS/RHS distinction really matter, as long as we have a master on one side and a slave on the other?
-
src/mem/cache/base.hh (Diff revision 1) -
Somehow these methods got shifted enough that diff thinks they were deleted and added... I'm now sure why, but it means I can't tell for sure if they were changed or not.
-
src/mem/coherent_bus.cc (Diff revision 1) -
When does this function get called? It looks like the bus only deals with CacheMntPackets and not MemMapPackets, so I don't see where this comes in.
-
src/mem/fs_translating_proxy.hh (Diff revision 1) -
It'd be nice to integrate this with SETranslatingProxy... outside the scope of fixing up the Port interface, I know, but it would still be nice in the long run. There's a lot of duplication (and I belive needless interface inconsistency) between these two objects.
-
src/mem/mem_object.hh (Diff revision 1) -
In the long run we don't really want the base MemObject to have to have methods for every protocol any derived object will ever support (particularly when we go to Ruby), so we'll have to figure out a way around this.
-
src/mem/module_interface.hh (Diff revision 1) -
To match gem5 style, this typedef should be to something like Packet or Pkt rather than pkt_t. Packet would be nice, but I can see where that would cause obvious complications with the existing Packet class. Maybe the existing Packet class should be renamed.
-
src/mem/module_interface.hh (Diff revision 1) -
Shouldn't this derive from MasterInterface<PKT> rather than MasterInterface<MemMapPacket>? Similarly for TimedSlaveInterface below.
-
src/mem/module_interface.hh (Diff revision 1) -
I know the "debug" name derives from SystemC TLM usage, but I think the m5 label "functional" is more descriptive (since it's not just debugging), and has more continuity with the old code. We should at least have a larger discussion about this before making a permanent decision.
-
src/mem/packet.hh (Diff revision 1) -
Do we really want to require that responses follow the same path in reverse that the request did? Also, we already have the senderState pointer which effectively allowed the set of modules a packet traversed to build a stack (as a linked list) in order to backtrack a response. That's a little more realistic to me too as it requires the routing nodes to maintain the state (as they would in a real system). We should discuss this more.
-
src/mem/packet.hh (Diff revision 1) -
I like separating out the base packet type from the fields specific to the current cache coherence protocol... I'm not sure this is the right split though (and I'm guessing you weren't even trying to get it right, just taking a first stab). Also, it's not clear why the cache packets aren't a subclass of the base memory access packet, since caches will want to pass through the simpler packets for uncached accesses, right? Also I'm not a fan of the MemMapPacket name; maybe MemAccPacket?
-
src/mem/port_interface.hh (Diff revision 1) -
I don't get the benefit of separating Port and PortInterface... it seems like there's always a 1:1 mapping between the types. Can you give an example where this separation is useful?
-
src/python/m5/simulate.py (Diff revision 1) -
yea, just do this whole thing in python...
-
src/python/swig/pyobject.cc (Diff revision 1) -
We need to find a way to support new protocols without having to modify this code...
-
src/sim/sim_object.hh (Diff revision 1) -
Do we really need this new callback? I see where it's used to load the kernel in System, but it seems like that should be done in an initState() callback. See http://gem5.org/SimObjects#Stages_of_initialization for more details.
-
src/sim/sim_object.cc (Diff revision 1) -
I like all this dot stuff, but I think it should be done in python and not in C++. I think that would get rid of the name to track the path etc. in C++ as well.
-
tests/configs/o3-nocaches-timing.py (Diff revision 1) -
FYI, we've had problems with O3 w/o caches in the past because the ifetch can get starved when the dcache is replaying loads (or something kind of I vs D starvation like that).
-
tests/configs/tsunami-o3.py (Diff revision 1) -
Were you thinking of adding a "socket pair" facility to simplify bi-directional connections?
Hi Andreas, Sorry for taking so long to get to this... it's an impressive (and daunting) amount of code, and it's been hard for me to find an otherwise unoccupied block of time long enough to really get into it. Overall, I think it's great. My main concerns are: - I really prefer the current "embedded Port object" model we have for associating standard port interfaces with specific calls on the associated MemObject. I don't really like how the multiple-inheritance-based scheme forces you to take all packets of a given type through the same method. Do you see a problem with sticking with the old model? - As I mentioned above, I don't see the benefit of separating Port and PortInterface classes; an example would help. - How do you handle the case where caches A and B, both connected to the same bus, both take a write miss to a read-only copy of line X and issue upgrade requests simultaneously via beginReq()? Say A goes first. You can't just hold B off and process its upgrade packet later, you have to somehow give B the opportunity to re-issue the request, since by the time it's B's turn, A will have invalidated B's block, forcing B to issue a read-exclusive instead of an upgrade. I didn't see that in there, but I didn't look really hard. - The whole python integration and setup needs some work (which you already said)... I'd really like this interface to carry forward into Ruby, which means we need to be able to support lots of protocols without having to explicitly name them all in some central place. Unfortunately I don't see any obvious way to do this, but we'll need to work one out. Steve
