dist,dev: add an ethernet switch model
Review Request #3230 - Created Nov. 19, 2015 and submitted
| Information | |
|---|---|
| Mohammad Alian | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 11292:12200987356d
---------------------------
dist,dev: add an ethernet switch model
several testing done with different benchmarks and different switch sizes
Issue Summary
Testing Done: |
|
|---|
-
src/dev/Ethernet.py (Diff revision 1) -
Is this needed? Why not rely on the count of the vector port?
-
src/dev/etherswitch.hh (Diff revision 1) -
Should this not appear at the top?
-
src/dev/etherswitch.hh (Diff revision 1) -
A few of these methods should probably be const. Also, some comments would be great.
-
src/dev/etherswitch.hh (Diff revision 1) -
Also in this file several methods should be const, and the level of comments is very poor.
-
src/dev/etherswitch.hh (Diff revision 1) -
unordered_map?
-
src/dev/etherswitch.hh (Diff revision 1) -
a pointer to a std::vector?
-
src/dev/etherswitch.hh (Diff revision 1) -
const?
-
src/dev/etherswitch.cc (Diff revision 1) -
Same as the previous file
-
src/dev/etherswitch.cc (Diff revision 1) -
there is a count associated with the vector port, why not use that?
-
src/dev/etherswitch.cc (Diff revision 1) -
This is not a very nice way to design things...
It would be much better if this module was event based. Is there a reason why it is not?
-
src/dev/Ethernet.py (Diff revision 1) -
Does 1Gbps as the default make sense? I think that is a little dated, particularly in the case of the systems multi/pd gem5 is meant to model. 10Gbps would probably be a more reasonable default.
-
src/dev/etherswitch.hh (Diff revision 1) -
Don't make this virtual, in fact don't make any of EtherSwitch's methods virtual. It should probably be a final class since any other switch wouldn't be a type of EtherSwitch.
If we ever did decide to have some other sort of switch, e.g., InfiniBand, we'd need to develop a base Switch class, and derive EtherSwitch, InfiniBandSwitch etc. from that.
-
src/dev/etherswitch.hh (Diff revision 1) -
const, also make sure const-correctness is observed throughout.
-
src/dev/etherswitch.hh (Diff revision 1) -
reference
-
src/dev/etherswitch.hh (Diff revision 1) -
return packet != nullptr; better expresses what you want here
-
src/dev/etherswitch.hh (Diff revision 1) -
Names won't change. Make them const.
-
src/dev/etherswitch.cc (Diff revision 1) -
Use nullptr throughout
Summary: |
|
||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
||||||||||||||||||||||||
Diff: |
Revision 2 (+878) |
Summary: |
|
|||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
|||||||||||||||
Diff: |
Revision 3 (+878) |
Description: |
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+878) |
-
src/dev/net/etherswitch.cc (Diff revision 4) -
It really irks me that we schedule this every clock period, even if there is nothing to do.
Even if we don't make the model event based, could we at least make sure that it only ticks if there is something to do?
Thanks for fixing this up; it looks good to me now. It would also be nice to eventually make the interconnect event-based, as suggested, but I think that can wait for a future patch.
Sorry it took a while for me to get to this... overall this is nice but I'm a little confused on the high-level design, and what the FabricLinks are adding, and how flow-control is working via the FabricLinks. For example, let's say everything is idle, and then two sources (A and B) simultaneously transmit to the same destination (C). If I'm understanding the code correctly, since the A->C and B->C FabricLinks are both idle, both the A and B sender interfaces will transmit their packets along these respective links, but then when the links go to deliver to C's receive interface, one will win and the other will find the interface busy and drop the packet, right? If so, then that means we drop a packet despite having plenty of buffering available. If not, then what am I missing? Thanks!
-
src/dev/net/etherswitch.hh (Diff revision 4) -
Is this license necessary? I believe this extra paragraph only applies to ARM contributions, so there's no reason to include it here. (Same goes for .cc file of course.)
-
src/dev/net/etherswitch.hh (Diff revision 4) -
I think the int-to-bool conversion should be explicit here
-
src/dev/net/etherswitch.hh (Diff revision 4) -
seems asymmetric that there's an inc but no dec
-
src/dev/net/etherswitch.hh (Diff revision 4) -
is this ever not the same as broadcastPeers.size()?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
can these declarations be moved down to the initializations?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
why not 'for (auto sender : tmpInterfaces)'?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
why the memcpy calls?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
seems extreme to do a hash lookup on both ports... every interface already has a number, you really just need lookupDestPort() to return an integer, then have a std::vector<> with each interface that returns the appropriate link from this source to that destination
-
src/dev/net/etherswitch.cc (Diff revision 4) -
would lastUseTime be more accurate than arrivalTime? Why are we tracking this if it's not being used?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
if you're going to be reordering things all the time like this, why not use a list instead of a vector so it's O(1) and not O(n)?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
I don't understand the "try again later"... is it the case that we're trying to do a second broadcast while the first is still in progress? If so, what happens to the second broadcast packet, does it get dropped? And does this mean that the initial broadcast only progresses when another broadcast comes along?
-
src/dev/net/etherswitch.cc (Diff revision 4) -
seems like it would be a lot simpler to just initialize peers to interfaces on the initial broadcast, and then this function could have a single loop over peers that's used in either case
So here's an alternative design that would be a lot simpler and I think just as accurate (modulo the caveat about external link scheduling at the end):
Each Interface has
- an outputFifo (no inputFifo) with <timestamp, packet> pairs
- an outputFifoOccupancy variable tracking the number of bytes in outputFifo
- an enqueue() methodThen on each packet received, in recvPacket(), basically put what is now inside the loop in forwardingEngine(), only much stripped down:
// could make this more complicated, but // probably no need done_tick = curTick() + switchDelay; receiver = lookupDestPort(); if (!receiver etc.) { for (auto it : interfaces) if (it != this) it->enqueue(pkt, done_tick); } else { receiver->enqueue(pkt, done_tick); }and then the enqueue looks like
void
Interface::enqueue(pkt, done_tick)
{
int new_occ = outputFifoOccupancy + pkt->size();if (new_occ > outputFifoSize) { // output buffer full, drop packet // do we need to delete the packet here? return; } // assuming per-interface transmission events, // if outputFifo is empty then we need to schedule // an event at done_tick to send this packet // out the external link if (outputFifo.empty()) xmitEvent.schedule(done_tick); outputFifo.push_back(pkt, done_tick); outputFifoOccupancy = new_occ;}
and then the transmit event just:
- pulls a packet off the front of outputFifo
- sends it on the connected external link
- updates outputFifoOccupancy
- if outputFifo is not empty, reschedules the xmit event based on the done_tick at the head of the fifo, the current tick, and when the external link is next freeI have to look at the code a little more to remember how to handle the outgoing link congestion... if I try to transmit on the link and it's busy, do I get a callback when it's free like I do with memory ports? Can I query the link to find out when it's free next, or do I need to track that directly? So that'll be a minor complication, but really only effects when the transmission events are scheduled.
Does this make sense? It would get rid of FabricLink, forwardingEngine() and transmissionEngine() (and thus all the tick events), the inputFifos, and the partial retries of broadcasts, but yet I don't think we would lose any appreciable modeling accuracy.
Description: |
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+395) |
Nice, thanks!! I really appreciate the effort to revamp this. It sure seems a lot simpler to me. I agree, it would be good to hear from others (particularly Tony) before committing though.
Meanwhile just a couple little things below...
-
src/dev/net/etherswitch.cc (Diff revision 5) -
I still don't understand why these memcpy calls are needed... it looks like the EthAddr constructor makes a copy too, so I'd think you could just call e.g.
Net::EthAddr destMacAddr(packet->data);
and if that doesn't work, we should probably tweak the EthAddr constructor so it does... -
src/dev/net/etherswitch.cc (Diff revision 5) -
would be good to encapsulate this delay calculation in a helper function, esp. if we're going to use it twice.
-
src/dev/net/etherswitch.cc (Diff revision 5) -
do we end up charging the delay twice, since we already accounted for it in enqueue()? The delay for the outgoing link should be calculated by the link itself, right?
-
src/dev/net/etherswitch.cc (Diff revision 5) -
could use std::make_pair to simplify this a bit
Description: |
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+392) |
Description: |
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 7 (+392) |
I think this turned out really well. Thanks for the feedback, Steve, and thanks, Mohamma, for addressing the feedback. I maintain my ship it, and again, I can ship this for you. I will ship it this weekend if nobody objects.
Ship It!
