BranchPred: Take the branch predictor out of O3CPU and make it a stand-alone
Review Request #47 - Created July 9, 2010 and submitted
| Information | |
|---|---|
| Timothy Jones | |
| gem5 | |
| Reviewers | |
| Default | |
BranchPred: Take the branch predictor out of O3CPU and make it a stand-alone SimObject. This then allows the same branch predictor to be shared amongst several CPUs. This patch is unfinished. I would like to take the branch predictor out of the inorder CPU as well, but want comments on whether this is the best approach to take first.
Posted (July 10, 2010, 2:46 a.m.)
Overall I'm in favor of what you're doing here... I think most of my comments are more about general cleanup of the bpred object unrelated to your relocation of it. But I've never really looked closely at the code before and it seems like this is a good opportunity to clean it up. As far as I can tell, the templating is only used to provide a MaxThreads value. I'd like to get rid of the template entirely (and the _impl.hh function and the empty base class), just pass max threads in as a parameter, and dynamically allocate the couple of arrays that depend on it.
-
src/cpu/o3/fetch_impl.hh (Diff revision 1) -
Why do we need to clear the state here? Do we need this function at all?
-
src/cpu/o3/fetch_impl.hh (Diff revision 1) -
I don't see the point in this function either... I'd just get rid of the empty function and this call.
-
src/cpu/pred/bpred_unit.hh (Diff revision 1) -
It doesn't look like this function or BTBLookup or BTBUpdate get called anywhere... let's get rid of them. Even if they do get called, I don't see the point of wrapping the indirection.
-
src/cpu/pred/builder.cc (Diff revision 1) -
I think this function should go in bpred_unit.cc and this file should go away.
Posted (July 10, 2010, 2:59 a.m.)
OK, looking a little closer I see that Impl is also used to get the DynInst type... but many of the methods called on DynInst (isUncondControl(), isCall(), isReturn()) are also StaticInst methods, and some of the others (like readPC()) are just used for DPRINTFs (and how is readPC() different from the PC that's passed in?). The only exception I see is seqNum. I would suggest passing in a StaticInstPtr instead of a DynInstPtr, and add another argument for seqNum. This makes the predictor much more independent of the CPU model. Then I'm pretty sure you can get rid of the template. I actually got started down this path looking at how you had to add an include of static_inst.hh for the warm() method, and wondering how the bpred got away without knowing what a static inst was prior to this...
Posted (July 10, 2010, 7:50 a.m.)
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
Shouldnt this *not* be hardcoded to 16, but instead the true RAS size?
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
What happens if the RAS is full? is it right to just overwrite the next entry?
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
If you miss on a BTB lookup and you've predicted taken, should you not instead stall (or have the option to stall) the CPU until branch resolution??? (I've got a patch for this option in another tree if people think that's useful)
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
I'm not a fan of this localBP vs tournamentBP comparison we do everytime we need to lookup or update the predictor... Why isnt there a base predictor class that the localBP and the tournamentBP derive from? Eventually if someone wanted to substitute the predictor or implement something else (e.g. no predictor at all and always predict false), all you would need to do is pass that as a SimObject(?) to the branch prediction unit and it would be plug-n-play.
Posted (July 11, 2010, 4:58 a.m.)
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
What if there is no valid entry at the top here? (the call stack is greater than RAS size)... In the later comment, I discuss options for this...
-
src/cpu/pred/bpred_unit_impl.hh (Diff revision 1) -
But isnt the depth of a stack limited by the size of the RAS anyway? Say we have 16 entries, then the call stack is limited by 16 no? I have 2 concerns here (and for surrounding RAS code): (1) If the RAS does overflow and we want to just keep overwriting, then are we always overwriting the to the right entry? (2) If there is no valid RAS entry (in the case of the call stack being greater than the RAS size), then the branch prediction needs to either return false or simply stall until resolution (similar to the BTB concern I had)
Posted (July 16, 2010, 11:18 a.m.)
With this new revision I have removed the template from the branch predictor and fixed it to use StaticInstPtr instead of a dynamic one. I've removed the functions that were never called and made a couple of the other fixes that were mentioned in the reviews. I've also taken the local and tournament predictors and made them inherited classes of the predictor that override key functions. If everyone is happy with this, I'll work on removing the code from the in-order CPU and making it use this predictor instead.
Posted (July 29, 2010, 1:32 a.m.)
Hi Tim, Sorry for the long delay on this... this looks great to me, with one minor issue: I see you changed the lookup/update method names to BPLookup/BPUpdate etc. The lower-case names are the ones we want to keep though, since they follow the M5 style (http://m5sim.org/wiki/index.php/Coding_Style#Naming); the BPFoo names were violating the style to begin with. I think there were two different interfaces, an external one to the CPU and a more internal one, with different names, and since you are getting rid of that layer of indirection you had to pick one, and you just happened to pick the worse one instead of the better one :-). Steve
