Models the request and response bandwidth between O3 and L1 (cache read and write ports)

Information
Submitter:	Amin Farmahini
Repository:	gem5
Branch:
Bugs:
Depends On:
Reviewers
Groups:	Default
People:

Description

I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). I limit both the number of requests sent by O3 and the number of responses received by O3.

For REQUESTS:
I have a separate read requests (loads) counter and a separate write requests (stores) counter and a separate shared requests (read/write) counter.
LOADS: O3 limits the number of read requests sent each cycle to the number of defined cache read ports.
STORES: Similarly, O3 limits the number of write requests sent each cycle to the number of defined cache write ports.
Also, shared ports can be used for read/write requests.
Note that no matter how many ports are defined, we have only a single actual cache port used for all read and write requests. So just like the current gem5 code, only one dcachePort is defined in the code. However, I limit the number of requests to the number of cache ports defined in parameters.

For RESPONSES:
If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.

I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated.

Testing Done

a few small benches done only in SE and classic

Issue Summary

Description	From	Last Updated	Status
Is this needed for the patch and if so, could you be more descriptive?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
The name does not really distinguish it from the existing getDeferredMemInstToExecute. Could we either use the same name or somehow ...	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Similar to previous remark	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Once again the name is a bit odd compared to deferredMemInsts	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
I would prefer to keep the assignment outside the condition	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
It looks like what you want is a deque and not a list? front and pop_front would be more descriptive	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
please use an unsigned type	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
const?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
const?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
const?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
const	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
const	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
When is a port "used"? To me it seems like a port that tried and did not yet succeed somehow ...	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Is the check really needed?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
&& ?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
&& ?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Not really related to this patch	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Useful, but not really related to this patch	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Whitespace	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Useful, but preferably in a patch on its own	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
I'd suggest making the debug improvements a separate patch	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
Where there not even more fields added to the class?	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
We failed and still delete all these bits and pieces. I don't follow the logic here.	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
We should do this as a separate patch	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
It is very unfortunate that this all ends up in the packet queue that is used in far more places ...	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open
I think this is a really dodgy one. In gem5 we have a general problem that there is no notify/update ...	Andreas Hansson	Jan. 23, 2013, 1:37 a.m.	Open

Status: Discarded

Change Summary:

oops. diff not working

Status: Discarded

Status: Re-opened

Bugs:

+	and, read, write, cache, ports

Diff:

Revision 2 (+421 -44)

Show changes

	src/cpu/base_dyn_inst.hh
	src/cpu/o3/O3CPU.py
	src/cpu/o3/iew_impl.hh
	src/cpu/o3/inst_queue.hh
	src/cpu/o3/inst_queue_impl.hh
	src/cpu/o3/lsq.hh
	src/cpu/o3/lsq_impl.hh
	src/cpu/o3/lsq_unit.hh
	1 more

Bugs:

-	and, read, write, cache, ports

at first glance this seems like a pretty useful fix. Thanks Amin! Anyone else have thoughts? Andreas, how does this interact with your changes?

Andreas Hansson Jan. 7, 2013, 11:20 p.m. (Jan. 7, 2013, 11:20 p.m.)

I have some reservations. I agree that it is a very useful addition, I am just not thrilled with the way it is done.

If we really want to separate reads and writes, then we should check them separately in recvResponse and deal with them independently (do we really want to separate them?). Moreover, we should not send a retry the next cycle, but rather when a response comes back and frees up a "port".

Change Summary:

A major update to the way response packets are limited. If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.

Diff:

Revision 3 (+567 -50)

Show changes

	src/cpu/base_dyn_inst.hh
	src/cpu/o3/O3CPU.py
	src/cpu/o3/iew_impl.hh
	src/cpu/o3/inst_queue.hh
	src/cpu/o3/inst_queue_impl.hh
	src/cpu/o3/lsq.hh
	src/cpu/o3/lsq_impl.hh
	src/cpu/o3/lsq_unit.hh
	11 more

Description:

~		I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). Before putting a patch online, I would like to get your feedback on it.
	~	I made some changes to O3 to model the bandwidth between O3 and L1. By bandwidth I mean the number of requests and responses sent or received each cycle (not the amount of data transferred). I limit both the number of requests sent by O3 and the number of responses received by O3.
-		Here is what I did to model cache ports for O3. I limit both the number of requests sent by O3 and the number of responses received by O3.

		For REQUESTS:
~		I have a separate read requests (loads) counter and a separate write requests (stores) counter.
	~	I have a separate read requests (loads) counter and a separate write requests (stores) counter and a separate shared requests (read/write) counter.
		LOADS: O3 limits the number of read requests sent each cycle to the number of defined cache read ports.
		STORES: Similarly, O3 limits the number of write requests sent each cycle to the number of defined cache write ports.
	+	Also, shared ports can be used for read/write requests.
		Note that no matter how many ports are defined, we have only a single actual cache port used for all read and write requests. So just like the current gem5 code, only one dcachePort is defined in the code. However, I limit the number of requests to the number of cache ports defined in parameters.

		For RESPONSES:
~		LOADS: I limit the number of load responses received each cycle to the number of cache read ports. Once O3 reaches its load response limit, the next load response in this cycle is rejected and the cache port is blocked until next cycle. Note that blocking the cache port also inhibits the O3 from receiving write responses which is not a correct thing, but I don't have any control on blocking read and write responses separately.
	~	If there are not enough cache ports, response packets are buffered in the port (cache side) and are sent to the processor next cycle.
-		STORES: same as above

~		I tried to avoid details such as split packets, pending packets, etc, to give a brief overview. I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated. What Steve mentioned seems a more concrete way to implement it, but lack of my knowledge about bus code in gem5 pushed me toward modifying the O3 code.
	~	I don't believe what I implemented is the best way to model cache ports here, so your feedback would be appreciated.

Let me start by saying that I really support the effort and think we should enable modelling of this resource constraint. That said, there are a number of issues that I see with how it is approached at the moment. I hope we can find a solution to all of them. The most important concerns are the changes to the PacketQueue and the notion of the "incoming cycle". See the code comments for more details.

Thanks again for all the hard work.

src/cpu/o3/iew_impl.hh (Diff revision 3)

Is this needed for the patch and if so, could you be more descriptive?

Show all issues

src/cpu/o3/inst_queue.hh (Diff revision 3)

The name does not really distinguish it from the existing getDeferredMemInstToExecute. Could we either use the same name or somehow make the TLB vs port distinction more clear?

Show all issues

src/cpu/o3/inst_queue.hh (Diff revision 3)

Similar to previous remark

Show all issues

src/cpu/o3/inst_queue.hh (Diff revision 3)

Once again the name is a bit odd compared to deferredMemInsts

Show all issues

src/cpu/o3/inst_queue_impl.hh (Diff revision 3)

I would prefer to keep the assignment outside the condition

Show all issues

src/cpu/o3/inst_queue_impl.hh (Diff revision 3)

It looks like what you want is a deque and not a list?

front and pop_front would be more descriptive

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

please use an unsigned type

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

const?

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

const?

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

const?

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

const

Show all issues

src/cpu/o3/lsq.hh (Diff revision 3)

const

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

When is a port "used"?

To me it seems like a port that tried and did not yet succeed somehow is used. Perhaps it's only confusion on my side

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

Is the check really needed?

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

&& ?

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

&& ?

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

Not really related to this patch

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

Useful, but not really related to this patch

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

Whitespace

Show all issues

src/cpu/o3/lsq_impl.hh (Diff revision 3)

Useful, but preferably in a patch on its own

Show all issues

src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)

I'd suggest making the debug improvements a separate patch

Show all issues

src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)

Where there not even more fields added to the class?

Show all issues

src/cpu/o3/lsq_unit_impl.hh (Diff revision 3)

We failed and still delete all these bits and pieces. I don't follow the logic here.

Show all issues

src/mem/cache/base.cc (Diff revision 3)

We should do this as a separate patch

Show all issues

src/mem/packet_queue.hh (Diff revision 3)

It is very unfortunate that this all ends up in the packet queue that is used in far more places than where it is needed.

Somehow I wish we could contain this in the CPU, and the current way of putting the responsibility on the PacketQueue seems odd to me.

Show all issues

src/sim/clocked_object.hh (Diff revision 3)

I think this is a really dodgy one. In gem5 we have a general problem that there is no notify/update distinction like in the SystemC or any RTL simulator. Thus, what happens at time T is visible in 0 time and the order of evaluating events at time T can have big repercussions.

In your case, the difference between the CPU going first and the cache going first is causing the problem as far as I can tell. I think we should try and solve this problem rather than work around it like this. I don't know what the general feeling is, but I fear that event priorities might not even be able to solve it. I am afraid I don't have any easy solution...

Show all issues

You have a pending review.

Review Board 2.0.15

Screenshots

Files

Issue Summary

Status: Discarded

Change Summary:

Status: Discarded

Status: Re-opened

Bugs:

Diff:

Bugs:

Change Summary:

Diff:

Description: