Review Board 2.0.15


Models the request and response bandwidth between O3 and L1 (cache read and write ports)

Review Request #1422 - Created Sept. 14, 2012 and updated

Information
Amin Farmahini
gem5
Reviewers
Default
I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). I limit both the number of requests sent by O3 and the number of responses received by O3.

For REQUESTS:
I have a separate read requests (loads) counter and a separate write requests (stores) counter and a separate shared requests (read/write) counter.
LOADS: O3 limits the number of read requests sent each cycle to the number of defined cache read ports.
STORES: Similarly, O3 limits the number of write requests sent each cycle to the number of defined cache write ports.
Also, shared ports can be used for read/write requests.
Note that no matter how many ports are defined, we have only a single actual cache port used for all read and write requests. So just like the current gem5 code, only one dcachePort is defined in the code. However, I limit the number of requests to the number of cache ports defined in parameters.

For RESPONSES:
If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.

I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated.
a few small benches done only in SE and classic

Issue Summary

26 26 0 0
Description From Last Updated Status
Is this needed for the patch and if so, could you be more descriptive? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
The name does not really distinguish it from the existing getDeferredMemInstToExecute. Could we either use the same name or somehow ... Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Similar to previous remark Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Once again the name is a bit odd compared to deferredMemInsts Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
I would prefer to keep the assignment outside the condition Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
It looks like what you want is a deque and not a list? front and pop_front would be more descriptive Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
please use an unsigned type Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
const? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
const? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
const? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
const Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
const Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
When is a port "used"? To me it seems like a port that tried and did not yet succeed somehow ... Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Is the check really needed? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
&& ? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
&& ? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Not really related to this patch Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Useful, but not really related to this patch Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Whitespace Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Useful, but preferably in a patch on its own Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
I'd suggest making the debug improvements a separate patch Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Where there not even more fields added to the class? Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
We failed and still delete all these bits and pieces. I don't follow the logic here. Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
We should do this as a separate patch Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
It is very unfortunate that this all ends up in the packet queue that is used in far more places ... Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
I think this is a really dodgy one. In gem5 we have a general problem that there is no notify/update ... Andreas Hansson Jan. 23, 2013, 1:37 a.m. Open
Review request changed
Updated (Jan. 18, 2013, 3:49 p.m.)

Description:

~  

I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). Before putting a patch online, I would like to get your feedback on it.

  ~

I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). I limit both the number of requests sent by O3 and the number of responses received by O3.

-   Here is what I did to model cache ports for O3. I limit both the number of requests sent by O3 and the number of responses received by O3.

   
   

For REQUESTS:

~   I have a separate read requests (loads) counter and a separate write requests (stores) counter.
  ~ I have a separate read requests (loads) counter and a separate write requests (stores) counter and a separate shared requests (read/write) counter.
    LOADS: O3 limits the number of read requests sent each cycle to the number of defined cache read ports.
    STORES: Similarly, O3 limits the number of write requests sent each cycle to the number of defined cache write ports.
  + Also, shared ports can be used for read/write requests.
    Note that no matter how many ports are defined, we have only a single actual cache port used for all read and write requests. So just like the current gem5 code, only one dcachePort is defined in the code. However, I limit the number of requests to the number of cache ports defined in parameters.

   
   

For RESPONSES:

~   LOADS: I limit the number of load responses received each cycle to the number of cache read ports. Once O3 reaches its load response limit, the next load response in this cycle is rejected and the cache port is blocked until next cycle. Note that blocking the cache port also inhibits the O3 from receiving write responses which is not a correct thing, but I don't have any control on blocking read and write responses separately.
  ~ If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.

-   STORES: same as above

   
~  

I tried to avoid details such as split packets, pending packets, etc, to give a brief overview. I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated. What Steve mentioned seems a more concrete way to implement it, but lack of my knowledge about bus code in gem5 pushed me toward modifying the O3 code.

  ~

I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated.

Posted (Jan. 23, 2013, 1:37 a.m.)
Let me start by saying that I really support the effort and think we should enable modelling of this resource constraint. That said, there are a number of issues that I see with how it is approached at the moment. I hope we can find a solution to all of them. The most important concerns are the changes to the PacketQueue and the notion of the "incoming cycle". See the code comments for more details.

Thanks again for all the hard work.
src/cpu/o3/iew_impl.hh (Diff revision 3)
 
 
Is this needed for the patch and if so, could you be more descriptive?
src/cpu/o3/inst_queue.hh (Diff revision 3)
 
 
The name does not really distinguish it from the existing getDeferredMemInstToExecute. Could we either use the same name or somehow make the TLB vs port distinction more clear?
src/cpu/o3/inst_queue.hh (Diff revision 3)
 
 
Similar to previous remark
src/cpu/o3/inst_queue.hh (Diff revision 3)
 
 
Once again the name is a bit odd compared to deferredMemInsts
src/cpu/o3/inst_queue_impl.hh (Diff revision 3)
 
 
I would prefer to keep the assignment outside the condition
src/cpu/o3/inst_queue_impl.hh (Diff revision 3)
 
 
It looks like what you want is a deque and not a list?

front and pop_front would be more descriptive
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
please use an unsigned type
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
const?
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
const?
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
const?
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
const
src/cpu/o3/lsq.hh (Diff revision 3)
 
 
const
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
When is a port "used"?

To me it seems like a port that tried and did not yet succeed somehow is used. Perhaps it's only confusion on my side
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
Is the check really needed?
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
&& ?
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
&& ?
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
Not really related to this patch
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
Useful, but not really related to this patch
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
Whitespace
src/cpu/o3/lsq_impl.hh (Diff revision 3)
 
 
Useful, but preferably in a patch on its own
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)
 
 
I'd suggest making the debug improvements a separate patch
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)
 
 
Where there not even more fields added to the class?
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)
 
 
We failed and still delete all these bits and pieces. I don't follow the logic here.
src/mem/cache/base.cc (Diff revision 3)
 
 
We should do this as a separate patch
src/mem/packet_queue.hh (Diff revision 3)
 
 
It is very unfortunate that this all ends up in the packet queue that is used in far more places than where it is needed.

Somehow I wish we could contain this in the CPU, and the current way of putting the responsibility on the PacketQueue seems odd to me.
src/sim/clocked_object.hh (Diff revision 3)
 
 
I think this is a really dodgy one. In gem5 we have a general problem that there is no notify/update distinction like in the SystemC or any RTL simulator. Thus, what happens at time T is visible in 0 time and the order of evaluating events at time T can have big repercussions.

In your case, the difference between the CPU going first and the cache going first is causing the problem as far as I can tell. I think we should try and solve this problem rather than work around it like this. I don't know what the general feeling is, but I fear that event priorities might not even be able to solve it. I am afraid I don't have any easy solution...