Changes to the gem5 memory-system (release-0.2)

Information
Submitter:	Andreas Hansson
Repository:	gem5
Branch:	default
Bugs:
Depends On:
Reviewers
Groups:	Default
People:	ali, gblack, nate, stever

Description

Changes to the gem5 memory-system (release-0.2)

------------------------------------------------------------------------------

What is the goal

The goal is to make it easier to use gem5 for communication-centric
modelling by adopting a communication framework similar to that of the
TLM-2.0 transaction level modelling standard from the Open SystemC
Initiative (OSCI). Just like TLM-2.0, the basic idea behind the
changes is to facilitate modelling of inter-module communication
through a set of well-define module interfaces, e.g. a memory-mapped
interface, and a cache-maintenance interface.

The major difference with release 0.1 of the extensions is that all
modules now implement the 4-phase handshakes. Thus, this serves as a
good indication of what the modules will look and feel like.

After this submission we are starting to put incremental patches in
place to get this into the main repository. In other words, the sooner
we get feedback the better. Once again thanks for the comments on the
first round.

------------------------------------------------------------------------------

What are the key improvements

- Master and slave interfaces (SystemC sc_export), distinguishing the
  different roles in inter-module communication. A master (TLM-2.0
  initiator) is a MemObject that initiates new transactions, and a
  slave (TLM2 target) is a module that responds to transactions
  initiated by master modules. The same module can act both as a
  master and a slave, and this would typically be the case for a model
  of a bridge, a router, or a bus. See src/mem/module_interface.hh

- Master and slave port interfaces that are used to access the
  corresponding module interfaces (SystemC sc_port). Together with the
  module interfaces the port interfaces form the basis for having
  ports and modules with different protocols. For example, a
  memory-mapped master port would be used to call functions
  implemented by a memory-mapped slave interface. See
  src/mem/port_interface.{hh,cc}

- Ports that only convey structural information and leave the syntax
  (payload type) and semantics to a specific port interface.  The
  ports themselves only contain information about their owners
  (structurally) and basic knowledge of their connectivity. The actual
  semantics and syntax of the communication between a master and a
  slave port is determined by their port interfaces and the
  corresponding exported module interfaces. See src/mem/port.{hh,cc}

- The specific type of packet is determined by the protocol of the
   interfaces enabling diversity in payload types. See
   src/mem/protocol.{hh,cc} and src/mem/packet.{hh,cc}

- Standard 4-phase handshakes (TLM-2.0 approximately timed) for
  request/response avoid complex receive/retry interaction between
  modules in timing mode. Every MemObject is not using the new 4-phase
  semantics.

- Ports do not implement any module-specific functionality but merely
  calls functions on their interface classes.

------------------------------------------------------------------------------

How does it work

A master port has a protocol-specific master-port interface, and is
associated with a master-module interface. An example of this is the
data and instruction port of a CPU. The master-port interface calls
functions on the connected slave-module interface. In the case of the
CPU, this could be e.g. memory, or a cache, both implementing the
memory-mapped slave interface and making it visible through a
memory-mapped slave port. The Python instantiation is currently based
on the position of the ports (left or right of the equality sign) to
determine if they are masters or slaves, but this should be eventually
be specified in the Module.py using subclasses of Port. Similarly, the
cache ports are currently determined by looking at the name of the
port. This should also be extended to use an enum or literal in the
Module.py. The binding is checked at instantiation time when the C++
objects are created and their ports connected. As part of the
instantiation a structural diagram of the system is also created as a
Graphviz dot file. Run it through "dot -Tsvg" and have a look in your
browser.  The diagram clearly shows what ports are connected, what
protocol and role they have, and as a tooltip if also shows the
address ranges of all memory-mapped slave ports.

Currently the 4-phase handshake are used by all the MemObjects in the
system.  However, the bridges are still present to facilitate the
integration work. See src/mem/bridge_classic_to_4phase.{hh,cc} and
src/mem/bridge_4phase_to_classic.{hh,cc}. The 4-phase is completely
replacing the old receive/retry handshakes, but there are situations
with unaligned accesses that still have to be addressed (we only
looked at Alpha and so got away without worrying).

Similar to the classic gem5 memory-system, a packet points to a
request and its associated data. In the typical case, a memory-mapped
request packet is created by a master, such as a DMA or a CPU. Once
beginReq is called, the sending (master) module should not change the
packet until its response is returned through a beginResp (where
applicable). An intermediate component, such as a bus or cache, may
create forwarded cache-maintenance packets from the original
memory-mapped request. Thus, one request gives rise to a chain of
requests and responses. Currently the lifetime and rules governing
those packets (and their request and data pointers) is work in
progress (see e.g. coherent_bus.cc). The original request/response may
be deallocated before the snoop request/response and vice versa. Smart
pointers and reference counting might be a viable solution, or
alternatively a more intelligent snoop controller in the
buses. In addition, the different packets (currently memory-mapped
and cache-maintenance) should be stripped of as much common
functionality as possible, reducing the memory-mapped packets to a
bare essential.

In response to questions from the reviews of release 0.1, the 4-phase
does not mandate a response handshake (it is possible to have only a
request begin/end). Also note that the untimed cache maintenance
protocol does not use the 4-phase at all. Thus, if there is need for
something else it is possible.

------------------------------------------------------------------------------

What is the intention with this patch

This patch is not showing all the changes made to the repository, but
in contrast to the release 0.1 this includes essentially all source
changes, and also diffs with respect to the revision the day we
branched (22 February 2011).

- The underlying infrastructure
  o Module Interface (what does a master/slave have to provide)
  o Port Interface (how is it accesses through the ports)
  o Port (how are the structural ports and logical interfaces bound together)
  o Protocol

- The basic building blocks
  o MemObject (maintain collections of ports and do look ups of names)
  o Packet Queue (similar to the Payload Event Queue in TLM2, and
    closely related to the SimpleTimingPort)

- The models themselves
  o NonCoherentBus, CoherentBus
  o I/O Device (show a simple memory-mapped slave)
  o PhysicalMemory (same as above)
  o Bridge (show the benefits of the protocol separation and clear port roles)
  o CPU (demonstrate the shift from functionality in the ports to
    interfaces of the modules)
  o Bridge classic to/from 4-phase (highlighting the difference between
    the semantics)

- An example of their use
  o Tsunami system (show the port connections and the structure)
  o Caches
  o CPUs

The goal is to get feedback and suggestions on anything from the
actual design and how it is implemented, to the coding style and code
comments. This is also an opportunity for everyone to influence and
steer the changes and the integration into the main gem5
repository. With this second review we also hope to share the
remaining trajectory for integration into the repository, chopping the
contributions up in incremental patches. This work is about to start,
so let us know as soon as possible if you have questions or concerns.

------------------------------------------------------------------------------

Testing and verification

In order to work effectively we have limited the regression to only
include the quick Alpha tests. For these tests, the appropriate
updates have been made to connect the additional ports, and define the
role and protocol for the ports in question. Due to the changes in
timing, small deviations (plus minus a few percent) in statistics have
been observed for a number of tests. We have considered this to be
within reasonable limits and updated the reference behaviour.

Testing Done

src/cpu/o3/cpu.cc (Diff revision 1)

Probably be better to inline the constructors and description methods in the class declaration to save retyping the class names.  I probably wouldn't object to putting the process() definitions there too since they're short.

Andreas Hansson Sept. 6, 2011, 2:17 a.m. (Sept. 6, 2011, 2:17 a.m.)
```
Sounds good
```

src/cpu/simple/timing.cc (Diff revision 1)

I don't really like having to identify which port something is coming in on by matching on the numeric ID... the old model with the embedded Port subclasses avoided this, and it would be nice not to lose that.

Andreas Hansson Sept. 6, 2011, 2:18 a.m. (Sept. 6, 2011, 2:18 a.m.)

Each port gets a pointer to both the parent MemObject (the structural owner), and the ModuleInterface that implements the behaviour. In the common case this would be the same object, and then the ports can be distinguished based on the ID. This fits very well with the buses for example. However, if someone wants to have "handlers" other than the MemObject itself that is possible, and these could be unique for the port in question. This gets very close to the old model of having behaviours associated with each port. Does that sound like a fit for what you want...best of both worlds?

As an option, TLM-2.0 enables the user to override the callback function through a function pointer which also enables the diversity that you are looking for. I would personally prefer to keep it as simple as possible though (and also enable the compiler to do a better job).

src/dev/alpha/Tsunami.py (Diff revision 1)

Does the LHS/RHS distinction really matter, as long as we have a master on one side and a slave on the other?

Andreas Hansson Sept. 6, 2011, 1:14 a.m. (Sept. 6, 2011, 1:14 a.m.)

The port role and protocol is now specified in the Python Port class and the LHS/RHS no longer matters.

src/mem/cache/base.hh (Diff revision 1)

Somehow these methods got shifted enough that diff thinks they were deleted and added... I'm now sure why, but it means I can't tell for sure if they were changed or not.

Andreas Hansson Sept. 6, 2011, 2:21 a.m. (Sept. 6, 2011, 2:21 a.m.)

The getAddrRanges for the bus has changed since it no longer does any cyclic avoidance. For the rest the reporting of ranges is unchanged.

src/mem/coherent_bus.cc (Diff revision 1)

When does this function get called?  It looks like the bus only deals with CacheMntPackets and not MemMapPackets, so I don't see where this comes in.

Andreas Hansson Sept. 6, 2011, 1:16 a.m. (Sept. 6, 2011, 1:16 a.m.)

This function gets called when a coherent bus receives a memory-mapped request, and overrides the corresponding (virtual) function from the non-coherent bus. The coherent bus must forward the request to all snoopers and thus cannot rely on the inherited behaviour.

src/mem/fs_translating_proxy.hh (Diff revision 1)

It'd be nice to integrate this with SETranslatingProxy... outside the scope of fixing up the Port interface, I know, but it would still be nice in the long run.  There's a lot of duplication (and I belive needless interface inconsistency) between these two objects.

Andreas Hansson Sept. 6, 2011, 2:22 a.m. (Sept. 6, 2011, 2:22 a.m.)
```
On the todo list
```

src/mem/mem_object.hh (Diff revision 1)

In the long run we don't really want the base MemObject to have to have methods for every protocol any derived object will ever support (particularly when we go to Ruby), so we'll have to figure out a way around this.

Andreas Hansson Sept. 6, 2011, 1:20 a.m. (Sept. 6, 2011, 1:20 a.m.)

An option is to let the MemObject handle the common gem5 cases (probably not too many), and then go via an intermediate subclass for any additional protocols.

A more generic solution is to templatise the MemObject and also enable multiple inheritance for the modules that have ports with multiple protocols.

It would also be great if the port information can be communicated in the SWIGed parameters so that all port creation can be done in the constructor. Do you know how/if this is possible?

src/mem/module_interface.hh (Diff revision 1)

To match gem5 style, this typedef should be to something like Packet or Pkt rather than pkt_t.  Packet would be nice, but I can see where that would cause obvious complications with the existing Packet class.  Maybe the existing Packet class should be renamed.

Andreas Hansson Sept. 6, 2011, 2:23 a.m. (Sept. 6, 2011, 2:23 a.m.)

I did not realise there was a specific style for the typedefs and merely opted for the TLM-2.0 way of naming the types.

The existing packet should no longer be used, and only the derived classes should appear in any code. Maybe it can be renamed to BasePacket and then we can use Packet for the typedefs. The only danger with reusing the name is confusion for the users. Ideas?

src/mem/module_interface.hh (Diff revision 1)

Shouldn't this derive from MasterInterface<PKT> rather than MasterInterface<MemMapPacket>?  Similarly for TimedSlaveInterface below.

Andreas Hansson Sept. 6, 2011, 1:22 a.m. (Sept. 6, 2011, 1:22 a.m.)
```
Indeed. Fixed!
```

src/mem/module_interface.hh (Diff revision 1)

I know the "debug" name derives from SystemC TLM usage, but I think the m5 label "functional" is more descriptive (since it's not just debugging), and has more continuity with the old code.  We should at least have a larger discussion about this before making a permanent decision.

Andreas Hansson Sept. 6, 2011, 1:30 a.m. (Sept. 6, 2011, 1:30 a.m.)

From the perspective of the memory system, a "functional" access is rather confusing, since it does rather the opposite and does not model the functionality of the components. recvDebug corresponds more or less exactly with the semantics of transport_dgb in TLM-2.0: "A debug access must be performed without any of the delays, waits, event notifications, or side effects associated with a regular transaction. The debug interface is, therefore, non-intrusive."

...but indeed it's up for discussion.

Ali Saidi Sept. 6, 2011, 2:52 a.m. (Sept. 6, 2011, 2:52 a.m.)

Just one comment to add here.... I was very much against this originally, but over the last year as I've described and ultimately confused enough people with our naming I think it might be the right decision. So, yes, we should consider if we want to have a global s/func/debug/g.

src/mem/packet.hh (Diff revision 1)

Do we really want to require that responses follow the same path in reverse that the request did?  Also, we already have the senderState pointer which effectively allowed the set of modules a packet traversed to build a stack (as a linked list) in order to backtrack a response.  That's a little more realistic to me too as it requires the routing nodes to maintain the state (as they would in a real system).  We should discuss this more.

Andreas Hansson Sept. 6, 2011, 1:32 a.m. (Sept. 6, 2011, 1:32 a.m.)

If the routing nodes are indeed to maintain state (which would be most realistic), then surely a response must come back the same path, or am I missing something?

Are there cases today where this is not the case? I would be keen to know if there is a situation when more complicated scenarios arise.

src/mem/packet.hh (Diff revision 1)

I like separating out the base packet type from the fields specific to the current cache coherence protocol... I'm not sure this is the right split though (and I'm guessing you weren't even trying to get it right, just taking a first stab).  Also, it's not clear why the cache packets aren't a subclass of the base memory access packet, since caches will want to pass through the simpler packets for uncached accesses, right?  Also I'm not a fan of the MemMapPacket name; maybe MemAccPacket?

Andreas Hansson Sept. 6, 2011, 2:24 a.m. (Sept. 6, 2011, 2:24 a.m.)

The split was indeed a very crude stab, and more help is needed from a real gem5 cache guru to get this right.

Having cache packets using memory-mapped packets as a subclass might also be useful. I will give this some more thought.

The naming is a tricky question, and Memory-Mapped essentially stems from TLM-2.0 and the fact that these packets are used by all the basic memory-mapped components, i.e. CPUs, non-coherent buses, bridges, and peripherals (not just memories). Do you not think MemAcc is a too restrictive name?

src/mem/port_interface.hh (Diff revision 1)

I don't get the benefit of separating Port and PortInterface... it seems like there's always a 1:1 mapping between the types.  Can you give an example where this separation is useful?

Andreas Hansson Sept. 6, 2011, 2:33 a.m. (Sept. 6, 2011, 2:33 a.m.)

A Port is a structural entity that connects two MemObjects, independent of their interfaces. The functionality of a port is thus that of structural composition of the system. The interfaces govern what you can do using the port, which differs depending on the protocol. There is still a better separation to be made, but for example the "getAddrRange" is only something you can ask a memory-mapped slave prot (as expected). Similarly, e.g. a Ruby port could have a completely different set of functions if desired.

src/python/m5/simulate.py (Diff revision 1)

yea, just do this whole thing in python...

Andreas Hansson Sept. 6, 2011, 1:49 a.m. (Sept. 6, 2011, 1:49 a.m.)

The reason for doing this in c++ is mainly to enable printing of the address ranges. At the time it was implemented there was also no port type information in the Python Port class.

src/python/swig/pyobject.cc (Diff revision 1)

We need to find a way to support new protocols without having to modify this code...

Andreas Hansson Sept. 6, 2011, 1:54 a.m. (Sept. 6, 2011, 1:54 a.m.)

If we really want to add more protocols in the "core" gem5 it is not too difficult to add another if/else/switch. If someone wants to add them on the side it becomes a bit more challenging. Is someone aware of a good design pattern for this?

src/sim/sim_object.hh (Diff revision 1)

Do we really need this new callback?  I see where it's used to load the kernel in System, but it seems like that should be done in an initState() callback.  See http://gem5.org/SimObjects#Stages_of_initialization for more details.

Andreas Hansson Sept. 6, 2011, 1:55 a.m. (Sept. 6, 2011, 1:55 a.m.)
```
I think you are absolutely right. Tests in progress.
```

src/sim/sim_object.cc (Diff revision 1)

I like all this dot stuff, but I think it should be done in python and not in C++.  I think that would get rid of the name to track the path etc. in C++ as well.

Andreas Hansson Sept. 6, 2011, 1:55 a.m. (Sept. 6, 2011, 1:55 a.m.)

The added benefit of the C++ is the address map, block size etc. Most important here is the ability to print the address ranges.

tests/configs/o3-nocaches-timing.py (Diff revision 1)

FYI, we've had problems with O3 w/o caches in the past because the ifetch can get starved when the dcache is replaying loads (or something kind of I vs D starvation like that).

Andreas Hansson Sept. 6, 2011, 2:33 a.m. (Sept. 6, 2011, 2:33 a.m.)

Thanks for the warning. It seems to work...and we needed a simple test case.

tests/configs/tsunami-o3.py (Diff revision 1)

Were you thinking of adding a "socket pair" facility to simplify bi-directional connections?

Andreas Hansson Sept. 6, 2011, 2:34 a.m. (Sept. 6, 2011, 2:34 a.m.)

A "port" in gem5 is already corresponding to a bi-directional TLM-2.0 "socket" in the sense that it has a forward and a backward path, with requests going in one direction, and responses in the other.

I would suggest to stick with the TLM-2.0 like notion and not create "super ports" that consist of two reversed ports.

Does this seem reasonable?

Hi Andreas,

Sorry for taking so long to get to this... it's an impressive (and daunting) amount of code, and it's been hard for me to find an otherwise unoccupied block of time long enough to really get into it.

Overall, I think it's great. My main concerns are:

- I really prefer the current "embedded Port object" model we have for associating standard port interfaces with specific calls on the associated MemObject. I don't really like how the multiple-inheritance-based scheme forces you to take all packets of a given type through the same method. Do you see a problem with sticking with the old model?

- As I mentioned above, I don't see the benefit of separating Port and PortInterface classes; an example would help.

- How do you handle the case where caches A and B, both connected to the same bus, both take a write miss to a read-only copy of line X and issue upgrade requests simultaneously via beginReq()? Say A goes first. You can't just hold B off and process its upgrade packet later, you have to somehow give B the opportunity to re-issue the request, since by the time it's B's turn, A will have invalidated B's block, forcing B to issue a read-exclusive instead of an upgrade. I didn't see that in there, but I didn't look really hard.

- The whole python integration and setup needs some work (which you already said)... I'd really like this interface to carry forward into Ruby, which means we need to be able to support lots of protocols without having to explicitly name them all in some central place. Unfortunately I don't see any obvious way to do this, but we'll need to work one out.

Steve

Andreas Hansson Sept. 6, 2011, 2:42 a.m. (Sept. 6, 2011, 2:42 a.m.)

Thanks for the extensive feedback. To comment on you main concerns:

- The embedded Port object would become an embedded interface object associated with a port, but it is still possible through the separation of structure and interface in the port creation.

- I wish I had a good answer, and also a good test case. Is there any way I can create this scenario? The requests will indeed be raised simultaneously, but only marked in service once accepted. I would greatly appreciate a more in-depth discussion on this issue and the order of begin/endReq and the busses forwarding of snoops.

- Would it not be a solution to have the three/four protocols of the core gem5 enumerated, or do you think even this is too much? There are ways around having the knowledge in the MemObject, but it introduces an additional level of inheritance (or multiple inheritance). We should be able to work something out, and it would be great to see this carried forward into Ruby.

You have a pending review.

Review Board 2.0.15

This change has been discarded.

Screenshots

Files

Status: Discarded