dev: Add m5 op to toggle synchronization for dist-gem5.
Review Request #3595 - Created Aug. 4, 2016 and submitted
| Information | |
|---|---|
| Michael LeBeane | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 11561:1414a40eb1e2 --------------------------- dev: Add m5 op to toggle synchronization for dist-gem5. This patch adds the ability for an application to request dist-gem5 to begin/ end synchronization using an m5 op. When toggling on sync, all nodes agree on the next sync point based on the maximum of all nodes' ticks. CPUs are suspended until the sync point to avoid sending network messages until sync has been enabled. Toggling off sync acts like a global execution barrier, where all CPUs are disabled until every node reaches the toggle off point. This avoids tricky situations such as one node hitting a toggle off followed by a toggle on before the other nodes hit the first toggle off.
Any comments on this patch?
I like the idea of starting the sync on a pseudo op very much. I cannot see the usefulness of stopping the sync after it started though. Do you have a use case in mind?
I have a few comments below (mainly minor nits).
However, I think there is also a basic issue with the CPU suspend approach. A CPU can wake up whenever there is an interrupt or even a snoop request. That should be taken care of somehow I assume.
-
src/dev/net/dist_iface.hh (Diff revision 1) -
The type should be bool.
Minor nit: the name could be changed to something like "syncStartOnPseudoOp" to reflect the meaning better.
-
src/dev/net/dist_iface.hh (Diff revision 1) -
Minor nit: could you rename this method to "toggleSync()" or similar? That would better describe the functionality. The word "switch" ususally refers to the network switch model in the dist-gem5 context so it is a bit confusing here.
-
src/dev/net/dist_iface.cc (Diff revision 1) -
This change effectively disables the use of the start_tick parameter. It is because 'nextAt' is overwritten in SyncEvent::start() by the max tick of all participating gem5 - which will be 0 at the first global barrier (if we do not start the sync on pseudo op).
However, I think the pseudo op is a much better way to defer the sync start. Could you just remove the 'start_tick' option altogether?
-
src/dev/net/dist_iface.cc (Diff revision 1) -
Cannot another active thread hit another stop-sync pesudo op while this thread is still suspended? Maybe all active threads should be suspended here?
Ship it! Thanks for the fixes!
