Joel Hestness got review request #3773!
ruby: PerfectSwitch add assured access arbitration
Review Request #3773 - Created Dec. 23, 2016 and updated
| Information | |
|---|---|
| Joel Hestness | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 11802:ca5c5b982ea5 --------------------------- ruby: PerfectSwitch add assured access arbitration When operating near bandwidth saturation and using finite cache hierarchy buffering, the round-robin arbitration in the PerfectSwitch caused low ID input buffers to gain access to the switch more frequently than other input buffers that might contain requests. This resulted from the priority cycling starting on input buffers with no pending requests and cycling around to the low ID buffers with pending requests. Part of the problem was that input-to-output port allocation was done on-the-fly while cycling through input ports. To fix this, refactor the PerfectSwitch to remove on-the-fly arbitration, and better delineate port allocation from switch traversal. Then, implement cycling-priority assured access arbitration using output port request batches to ensure that all input ports are given the same priority when buffers are full. This fix reduces GPU core progress asymmetry from >3x down to <12%, and in line with hardware.
Extensive testing and use in gem5-gpu. Used GPU to saturate cache hierarchy
bandwidth, and tracked threadblock progress to witness asymmetry. Repeated
this testing after the fix to see greatly reduced asymmetry. Also, in these
small tests, simulator run time improves slightly due to reduced amount of
work performed by PerfectSwitch arbitration. Also, have run thousands of
simulations with this patch to verify that the changes work for a wide
range of simulated system behaviors.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+318 -160) |
Testing Done: |
|
|---|
Overall this patch looks really good. I'm sure it helps out GPU simulations quite a bit. I do have a few questions/comments I would like answered/addressed before I give it a ship it.
-
src/mem/ruby/network/simple/PerfectSwitch.hh (Diff revision 2) -
In your comment, please explain why this is a three dimensional vector, rather than just a two dimensional one vnet x input port. Based on the current comment, I would have thought you only had to maintain this bit vector for each vnet's input port, rather than the vnet input/output combination.
-
src/mem/ruby/network/simple/PerfectSwitch.cc (Diff revision 2) -
Minor question, but wouldn't a 'return' be more appropriate than a 'break'?
-
src/mem/ruby/network/simple/PerfectSwitch.cc (Diff revision 2) -
Is it possible to pull this loop into a separate function? This is quite a complicated, long while loop. It would be nice to break it up and make it more readable.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+321 -149) |
