Review Board 2.0.15


cpu: split o3-specific parts out of BaseDynInst

Review Request #529 - Created March 1, 2011 and updated

Information
Korey Sewell
gem5
default
Reviewers
Default
ali, gblack, nate, stever
cpu: split o3-specific parts out of BaseDynInst
The bigger picture goal is that I want to get the InorderDynInst class derived from the
BaseDynInst, since there is no need to replicate a lot of useful code already defined
in BaseDynInst (e.g. microcode identification, etc.) and Inorder can take advantage
of common code that handles microcode and other features that other ISAs need.

But to do this, there are a lot of o3-specific things that are in BaseDynInst, that I pushed to
O3DynInst in this patch. Book-keeping variables that handle the IQ,LSQ,ROB are unnecessary in
the base class but generic variables that will work across CPUs (IsSquashed, IsCompleted, etc.)
are kept in the base class.

The upside is more consistency across the simple models (branch prediction and instruction
identification are all in one common place).

I really wanted to define pure virtual functions for read/write(to memory) and the
set<Int/Float>RegOperand, but virtual functions in a templated class is a no-no and
I couldn't get around that (suggestions?).

Also, I'd rather not use the "this->" pointer all over the place to access member variables of
the templated Base class, but it had to be done.

Other than those quirks, simulator functionality should stay the same as the O3 Model always references
the O3DynInst pointer and the InOrder model doesnt currently make use of the base dyn inst. class.
(but it will be easier to derive from now...)

   
Posted (March 3, 2011, 12:41 p.m.)
Please don't ship this until I have a chance to try it, I just want to make sure it doesn't break ARM_FS/O3.
  1. Sure, I'd welcome a go of things from some other folks to test if I haven't introduced something quirky.
    
    After there is some commentary, I'll make sure to run the full regressions before committing this because as we all know BaseDynInst is a pretty fundamental part of M5.
    
    Also, I'll be posting an update to this diff soon that will make the set<Int/Float>RegOperand pure virtual (I mistakenly thought those were templated member functions in the first go-round). 
  2. Anything happen with your update diff? If you could verify it passes the arm/o3 full system regression I just committed and then I'll give it a go on a bunch more tests. 
  3. Gabe made a good point about the virtual function overhead on the commonly used set*Operand functions and I've just been waffling on whether to even make those pure virtual or not.
    
    However, I'll go ahead and take a hard look at this again , run the regressions, and post an update tomorrow so we can move on with this potential change.
  4. The ARM regressions pass with this patch. Feel free to do further testing.
  5. What about the virtual function overhead?
  6. This patch I posted didn't have any virtual functions in it. I was speculating whether or not we should add the pure virtual functions to solidify the DynInst interface.
    
    I went ahead and added the virtual functions in a separate patch, ran the ARM_FS o3 regressions 2x on the same machine, and then took the inst/second from the generated m5 stat files:
    o3-timing (current): 115133, 115406
    o3-timing (this patch): 115011, 115709
    o3-timing (this patch w/pure virtual functions on BaseDynInst read/set*Operands): 114368,113055
    
    Although running something 2x isn't anything that could be called "thorough", it looks like adding the virtual functions incurs a overhead of about 1K-2K inst/second, while the patch as currently constructed gives about the same performance as the code that doesn't split the o3-specific code out of BaseDynInst. 
    
    Since few people actually work on the internals of the CPU models anyway, it's probably better to go with the patch submitted (pending further ARM-FS testing) than to sacrifice the small amount of performance for an interface that would rarely, if ever, be extended past the Inorder/O3 models.
  7. Can I assume that this patch is OK to push (assuming a final run of all the o3-regression tests pass???)
Posted (March 29, 2011, 3:02 a.m.)
I agree with the sentiment of this change, but I think you moved too much into the O3 class. There's functionality (pointed out below) that you'll need to support in InOrder to be compliant with the interface instructions expect from CPUs, specifically delayed translation and oddly shaped/sized, memory accesses with readBytes/writeBytes. You'll have to support those to run all the ISAs, as would any other CPU using a dyninst in the future. The implementations in the base dyninst are pretty generic, although feel free to point out why they might not work with InOrder.
  1. Gabe,
    I think I agree with your comments. The intent with making the merge is to support some of the features necessary for ISAs like ARM and x86. 
    
    But I had some reservations about keeping the translation and the unaligned memory access code in the BaseDynInst class, because in the InOrder model that stuff is handled separately in the "CacheUnit" resource for InOrder. It's done in a somewhat similar fashion to how the LSQ works in O3. 
    
    However, there are issues say for split accesses whereas in the O3 model you try to make both requests on the same cycle (and fail if you don't), InOrder splits that up into separate requests to the cache allowing for overlap of the split request in high contention scenarios. The separate TLB translation is also done so that if the TLB is blocked/unavailable/etc. then you are not having to wait for 2 mshrs or 2 tlb-"bandwidth slots" to be available. 
    
    With that said, I've been looking at the "CacheUnit" and "LSQ" implementations and now think that there is no reason that the DynInst can't make the request for a write and then the actual receiving object (LSQ or CacheUnit) can buffer the requests until the cache becomes available. The final "trick", so to speak, is for the receiving memory object to be able to tell that when all translations are done and also if the load/store was sent to memory successfully. 
    
    I think the support I need to implement this is there though, so I'm going to update this patch with the generic translation and <read/write>Bytes support back in the Base class. If there are any problems with then getting that to work for InOrder, then I'll bring that up at that point.
src/cpu/base_dyn_inst.hh (Diff revision 1)
 
 
You're going to need to support readBytes and writeBytes style loads and stores to run all the ISAs and to conform to the interfaces the CPU is supposed to provide. These implementations seem pretty generic to me and should work with InOrder, although I don't actually know that much about InOrder.
src/cpu/base_dyn_inst.hh (Diff revision 1)
 
 
You're also going to need to support delayed translation for X86 or ARM. It's important we don't have CPUs diverge in what functionality they provide.
src/cpu/base_dyn_inst.hh (Diff revision 1)
 
 
This stuff looks old and I'm guessing should be deleted one way or the other.
  1. There's a slightly different way that InOrder handles this Result structure so I had planned to revisit this and merge it in after I merged inorder into this style of DynInst object.
src/cpu/base_dyn_inst.hh (Diff revision 1)
 
 
This seems pretty generic and necessary for delayed translations. I think you probably need to update InOrder to support it rather than isolating it to O3.
src/cpu/base_dyn_inst.hh (Diff revision 1)
 
 
Ditto
src/cpu/o3/dyn_inst.hh (Diff revision 1)
 
 
As mentioned above, this "result" stuff may just be old cruft. You should check to see if it's used at all, and if not just let it go away.