multi-gem5: add support for multi gem5 runs
Review Request #2826 - Created May 15, 2015 and submitted
| Information | |
|---|---|
| Curtis Dunham | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Multi gem5 is an extension to gem5 to enable parallel simulation of a
distributed system (e.g. simulation of a pool of machines
connected by Ethernet links). A multi gem5 run consists of seperate gem5
processes running in parallel (potentially on different hosts/slots on
a cluster). Each gem5 process executes the simulation of a component of the
simulated distributed system (e.g. a multi-core board with an Ethernet NIC).The patch implements the "distributed" Ethernet link device
(dev/src/multi_etherlink.[hh.cc]). This device will send/receive
(simulated) Ethernet packets to/from peer gem5 processes. The interface
to talk to the peer gem5 processes is defined in dev/src/multi_iface.hh and
in tcp_iface.hh.There is also a central message server process (util/multi/tcp_server.[hh,cc])
which acts like an Ethernet switch and transfers messages among the gem5 peers.A multi gem5 simulations can be kicked off by the util/multi/gem5-multi.sh
wrapper script.Checkpoint support will follow in a subsequent patch.
Before I try to get a better understanding of the code, can you explain what's
the purpose of multi_etherlink object? Can you provide an actual entity this
object is trying to emulate?
There are several style issues. The writer should read gem5.org/Coding_Style.
I am still trying to understand all the synchronization code. So it will take
me sometime before I am able to review that code.
-
src/dev/etherpkt.hh (Diff revision 3) -
store not strore in these two lines.
-
src/dev/multi_etherlink.hh (Diff revision 3) -
double? The simulator does not understand fractional ticks.
-
src/dev/multi_etherlink.hh (Diff revision 3) -
Please don't call it rate. In abstraction ticks per byte is also a rate, but I think the general meaning for rate is something amount of work carried out per unit time. You probably want to work with the bandwidth of the link.
-
src/dev/multi_etherlink.cc (Diff revision 3) -
I think there is some confusion between what the python parameter represents and what the C++ code expects for this ticksPerByte variable.
-
src/dev/multi_etherlink.cc (Diff revision 3) -
If the checkpoint patch from Andreas Sandberg goes in first, you would need to fix the serialize functions.
-
src/dev/multi_iface.cc (Diff revision 3) -
indentation is of.
-
util/multi/gem5-multi.sh (Diff revision 3) -
number not naumber.
Firstly, thanks for this patch, this is really nice work.
I have only done a cursory review of this patch so I'm still looking over the code in more detail, but I thought I'd share some of my initial thoughts to get the conversation going on this since it seems to have stagnated.
Reiterating Nilay's point: there are a lot of style issues that need to be fixed.
It seems like this would be useful for large-scale systems, but could you give some idea how easily one could derive from the multi link/iface objects for use with a multi-threaded aproach, thereby avoiding socket-based communication? E.g., if I wanted to model small/medium scale distributed systems consisting of ~10s of nodes on a single host machine. It would be nice if multi-gem5 and a multi-threaded approached were unified and built off the same base classes.
For the TCP server, have you thought about an event-based approach, i.e., libevent or libev as opposed to using poll()?
Ship It!
