Review Board 2.0.15


Joel Hestness got review request #3773!

ruby: PerfectSwitch add assured access arbitration

Review Request #3773 - Created Dec. 23, 2016 and updated

Information
Joel Hestness
gem5
default
Reviewers
Default
Changeset 11802:ca5c5b982ea5
---------------------------
ruby: PerfectSwitch add assured access arbitration

When operating near bandwidth saturation and using finite cache hierarchy
buffering, the round-robin arbitration in the PerfectSwitch caused low ID
input buffers to gain access to the switch more frequently than other input
buffers that might contain requests. This resulted from the priority cycling
starting on input buffers with no pending requests and cycling around to the
low ID buffers with pending requests. Part of the problem was that
input-to-output port allocation was done on-the-fly while cycling through
input ports.

To fix this, refactor the PerfectSwitch to remove on-the-fly arbitration, and
better delineate port allocation from switch traversal. Then, implement
cycling-priority assured access arbitration using output port request batches
to ensure that all input ports are given the same priority when buffers are
full.

This fix reduces GPU core progress asymmetry from >3x down to <12%, and in
line with hardware.

Extensive testing and use in gem5-gpu. Used GPU to saturate cache hierarchy
bandwidth, and tracked threadblock progress to witness asymmetry. Repeated
this testing after the fix to see greatly reduced asymmetry. Also, in these
small tests, simulator run time improves slightly due to reduced amount of
work performed by PerfectSwitch arbitration. Also, have run thousands of
simulations with this patch to verify that the changes work for a wide
range of simulated system behaviors.

Issue Summary

2 0 2 0
Review request changed
Updated (Jan. 24, 2017, 10:49 p.m.)

Description:

~  

Changeset 11786:93f0e3b78f2d

  ~

Changeset 11802:ca5c5b982ea5

   
   

ruby: PerfectSwitch add assured access arbitration

   
   

When operating near bandwidth saturation and using finite cache hierarchy

    buffering, the round-robin arbitration in the PerfectSwitch caused low ID
    input buffers to gain access to the switch more frequently than other input
    buffers that might contain requests. This resulted from the priority cycling
    starting on input buffers with no pending requests and cycling around to the
    low ID buffers with pending requests. Part of the problem was that
    input-to-output port allocation was done on-the-fly while cycling through
    input ports.

   
   

To fix this, refactor the PerfectSwitch to remove on-the-fly arbitration, and

    better delineate port allocation from switch traversal. Then, implement
    cycling-priority assured access arbitration using output port request batches
    to ensure that all input ports are given the same priority when buffers are
    full.

   
   

This fix reduces GPU core progress asymmetry from >3x down to <12%, and in

    line with hardware.

Diff:

Revision 3 (+321 -149)

Show changes

Ship it!
Posted (Jan. 25, 2017, 10:02 a.m.)
Ship It!