Models the request and response bandwidth between O3 and L1 (cache read and write ports)
Review Request #1422 - Created Sept. 14, 2012 and updated
| Information | |
|---|---|
| Amin Farmahini | |
| gem5 | |
| Reviewers | |
| Default | |
I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). I limit both the number of requests sent by O3 and the number of responses received by O3. For REQUESTS: I have a separate read requests (loads) counter and a separate write requests (stores) counter and a separate shared requests (read/write) counter. LOADS: O3 limits the number of read requests sent each cycle to the number of defined cache read ports. STORES: Similarly, O3 limits the number of write requests sent each cycle to the number of defined cache write ports. Also, shared ports can be used for read/write requests. Note that no matter how many ports are defined, we have only a single actual cache port used for all read and write requests. So just like the current gem5 code, only one dcachePort is defined in the code. However, I limit the number of requests to the number of cache ports defined in parameters. For RESPONSES: If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle. I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated.
a few small benches done only in SE and classic
Issue Summary
Status: Discarded
Change Summary:
oops. diff not working
at first glance this seems like a pretty useful fix. Thanks Amin! Anyone else have thoughts? Andreas, how does this interact with your changes?
Change Summary:
A major update to the way response packets are limited. If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.
Diff: |
Revision 3 (+567 -50) |
|---|
Description: |
|
|---|
Let me start by saying that I really support the effort and think we should enable modelling of this resource constraint. That said, there are a number of issues that I see with how it is approached at the moment. I hope we can find a solution to all of them. The most important concerns are the changes to the PacketQueue and the notion of the "incoming cycle". See the code comments for more details. Thanks again for all the hard work.
-
src/cpu/o3/iew_impl.hh (Diff revision 3) -
Is this needed for the patch and if so, could you be more descriptive?
-
src/cpu/o3/inst_queue.hh (Diff revision 3) -
The name does not really distinguish it from the existing getDeferredMemInstToExecute. Could we either use the same name or somehow make the TLB vs port distinction more clear?
-
src/cpu/o3/inst_queue.hh (Diff revision 3) -
Similar to previous remark
-
src/cpu/o3/inst_queue.hh (Diff revision 3) -
Once again the name is a bit odd compared to deferredMemInsts
-
src/cpu/o3/inst_queue_impl.hh (Diff revision 3) -
I would prefer to keep the assignment outside the condition
-
src/cpu/o3/inst_queue_impl.hh (Diff revision 3) -
It looks like what you want is a deque and not a list? front and pop_front would be more descriptive
-
src/cpu/o3/lsq.hh (Diff revision 3) -
please use an unsigned type
-
src/cpu/o3/lsq.hh (Diff revision 3) -
const?
-
src/cpu/o3/lsq.hh (Diff revision 3) -
const?
-
src/cpu/o3/lsq.hh (Diff revision 3) -
const?
-
src/cpu/o3/lsq.hh (Diff revision 3) -
const
-
src/cpu/o3/lsq.hh (Diff revision 3) -
const
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
When is a port "used"? To me it seems like a port that tried and did not yet succeed somehow is used. Perhaps it's only confusion on my side
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
Is the check really needed?
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
&& ?
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
&& ?
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
Not really related to this patch
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
Useful, but not really related to this patch
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
Whitespace
-
src/cpu/o3/lsq_impl.hh (Diff revision 3) -
Useful, but preferably in a patch on its own
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3) -
I'd suggest making the debug improvements a separate patch
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3) -
Where there not even more fields added to the class?
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 3) -
We failed and still delete all these bits and pieces. I don't follow the logic here.
-
src/mem/cache/base.cc (Diff revision 3) -
We should do this as a separate patch
-
src/mem/packet_queue.hh (Diff revision 3) -
It is very unfortunate that this all ends up in the packet queue that is used in far more places than where it is needed. Somehow I wish we could contain this in the CPU, and the current way of putting the responsibility on the PacketQueue seems odd to me.
-
src/sim/clocked_object.hh (Diff revision 3) -
I think this is a really dodgy one. In gem5 we have a general problem that there is no notify/update distinction like in the SystemC or any RTL simulator. Thus, what happens at time T is visible in 0 time and the order of evaluating events at time T can have big repercussions. In your case, the difference between the CPU going first and the cache going first is causing the problem as far as I can tell. I think we should try and solve this problem rather than work around it like this. I don't know what the general feeling is, but I fear that event priorities might not even be able to solve it. I am afraid I don't have any easy solution...
