ruby: Check MessageBuffer space in garnet NetworkInterface
Review Request #3753 - Created Dec. 8, 2016 and submitted
| Information | |
|---|---|
| Matthew Poremba | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
| tushar | |
Changeset 11792:9ee13c6fa290 --------------------------- ruby: Check MessageBuffer space in garnet NetworkInterface Garnet's NetworkInterface does not consider the size of MessageBuffers when ejecting a Message from the network. Add a size check for the MessageBuffer and only enqueue if space is available. If space is not available, the message if placed in a queue and the credit is held. A callback from the MessageBuffer is implemented to wake the NetworkInterface. If there are messages in the stalled queue, they are processed first, in a FIFO manner and if succesfully ejected, the credit is finally sent back upstream. The maximum size of the stall queue is equal to the number of valid VNETs with MessageBuffers attached.
This is an excellent patch! Something that has been long due.
(BTW does the same problem occur in simple network as well?)
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
This is a great change! Anyone wanting to add more stats can do that here.
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
Stylistic comment:
Can we move this piece of code to a new function which returns true/false and sets the value of unstalledMessage?Otherwise the wakeup function is very long now and performs the actual action of reading from the link only at the very end.
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
These three lines should become a new send_credit function.
Credit *credit_flit = new Credit(stallFlit->get_vc(), true, curCycle());
outCreditQueue->insert(credit_flit);
outCreditLink->scheduleEventAbsolute(clockEdge(Cycles(1)));The function can be called here and later.
Having two pieces of code where credits are being created and sent can lead to a lot of bugs whenever anyone makes changes and forgets to make changes at both places (have seen that a lot from experience) :) -
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
Looks like only HEAD_TAIL and TAIL_ flits set extLinkAvail to false and are consumed here, while other flits are consumed later in the code. It will be a little confusing for a newbie looking at the code.
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
Adding to the point above -
My worry is that someone new looking at the code might get a little confused about the fact that the consumeLink happens twice. And that the HEAD and HEAD_TAIL are sometimes consumed here, and sometimes later in the code.
See my suggestion below on some code restructuring.
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 1) -
Looks like the link not consumed if there are stalled messages.
But this would mean that the router will keep inserting flits into the link which may not get consumed, effectively modeling a large queue inside the link which is incorrect.
The NI always consumes the link. This is because the router checked for credits before sending to the NI.Here is my suggestion for this piece of code:
if (inNetLink->isReady(curCycle()) {
flit *t_flit = inNetLink->consumeLink();// Now check if message buffer has slots.
// If it does, then { (i) insert msg into msg buffer, (ii) call send_credit function. }
// else { insert msg into stall queue.}// Implement a send_credit function that can be called from here or when the stall queue is read.
...
The extLinkAvail piece of code is not needed. It gets subsumed here.
Change Summary:
Update patch to address issues
Description: |
|
||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+147 -34) |
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 2) -
If a flit is waiting for the output buffer to become free, this will get counted in its network delay component in the current implementation.
It should be counted in its queueing delay component I think ...
Queueing delay is the delay at the message buffers and NICs, network delay is from src NI to dest NI and includes delay in routers and links. -
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 2) -
See my comment below in the function definition regarding the name of this variable ...
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 2) -
We should consume the link irrespective of whether the state of the stall queue. If we do not consume, effectively we are modeling a quueue inside the network link.
I can see why you added this condition though - you don't want more than one enqueue into the message buffer in the same cycle.
If the stall queue already enqueued something, you are making the one from the link wait.How about the following:
bool unstalledMessage = checkStallQueue();
if (inNetLink->isReady(curCycle()) {
flit *t_flit = inNetLink->consumeLink();...
if (outNode_ptr[vnet]->areNSlotsAvailable(1, curTime) && !unstalledMessage) {
// enqueue into protocol buffer
}
else {
// enqueue into stall queue
} -
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 2) -
The name of this variable is a little confusing.
Hmm I think I can see why you named it like that. If a message that was stalled gets unstalled, then this variable becomes true.
But if the stall queue is empty, unstalledMessage is returned false, which means stalledMessage is true .. which is not quite right.
Change Summary:
change logic so that network links are not modeled as a queue.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+168 -37) |
Any thoughts on the latest diff?
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 3) -
Shall the NI reschedule itself if all stall flits in the m_stall_queue fail ejecting to outNode_ptr? It seems possible that none of the outNode_ptr is able to accecpt flits, while the router is not sending any flit to the NI.
Change Summary:
address a corner case where MessageBuffer may not wakeup NI.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+174 -37) |
Ship It!
Ship It!
-
src/mem/ruby/network/garnet2.0/NetworkInterface.cc (Diff revision 4) -
Hey Matt,
There is a minor change I have been wanting to push in which you can add as part of this patch:We count network_delay as dequeue_time - enqueue_time.
A minor issue is that the "enqueue_time" includes an additional cycle at the source NIC when the packet was created.
For example:
Suppose we have a 1-flit packet that we have to send one hop from Node 0 to Node 1.
We create in Cycle 1.Cycle 1: flit enqueued by NIC 0
Cycle 2: flit in link_NIC0-Router0
Cycle 3: flit in Router 0
Cycle 4: flit in link_Router0-Router1
Cycle 5: flit in Router 1
Cycle 6: flit in link_Router1-NIC1
Cycle 7: flit dequeued by NIC 1.Network delay will be calculated as 7-1 = 6 cycles.
But the actual "network" delay was 5 cycles (cycle 2 to cycle 6).The additional 1-cycle in NIC0 is going to be part of the src_queueing_delay so it is being accounted for.
Does this make sense?
If yes, can you update network delay to
dequeue_time - enqueue_time - 1 ?And then ship it.
Thanks,
Tushar
Description: |
|
||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+174 -37) |
Ship It!
