Review Board 2.0.15


Decode: Pull instruction decoding out of the StaticInst class into its own.

Review Request #828 - Created Aug. 15, 2011 and submitted

Information
Gabe Black
gem5
Reviewers
Default
ali, gblack, nate, stever
Decode: Pull instruction decoding out of the StaticInst class into its own.

This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.

Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.

Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.

   
Posted (Sept. 4, 2011, 2:27 a.m.)
Assuming there are no stats changes, it seems fine. 
src/cpu/o3/fetch.hh (Diff revision 1)
 
 
This comment is nearly useless. 
src/cpu/simple/base.cc (Diff revision 1)
 
 
Have you looked at the performance difference here?
Posted (Sept. 6, 2011, 2:33 p.m.)
Hi Gabe,

Overall this looks good.  Just two comments:

1. I think we need to have a global per-ISA decode cache.  I understand that if the decoder becomes stateful then the "front end" decoder object needs to be per context, but I'm very concerned that if we replicate the entire decoder cache for each context (as it looks like you're doing here) that will have a pretty big impact on memory footprint and cache miss rate for large target systems.  Ideally the per-context "front end" decoders could share a global per-ISA "back end" that has most or all of the caching.  Once we go parallel then there will be some interesting tricks to keep the threads coordinated as they update the cache, but I'm willing to deal with that when we get there.

2. Less significantly, the "foo->getDecoderPtr()->decode()" idiom is kind of tiring, especially since the only reason anyone ever calls getDecoderPtr() in the current code is to call decode() on what comes back.  I'd prefer to see the objects that have getDecoderPtr() also have a direct decode() method that encapsulates the idiom, so you can just call (for example) "context->decode()".  It just seems a lot cleaner semantically too; you want to decode the instruction using this context, and the fact that there's a separate embedded decoder object that you need to get a pointer to and then dereference is just an implementation detail that should be hidden as much as possible.

Steve
  1. Oops, somehow I overlooked the 'static' on the DecodeCache object declaration... so never mind about #1.  Sorry.
    
    I do wonder if it would make more sense to have the 'recent' array associated with the Decoder rather than the DecodeCache though, since the locality is most likely per context and not global.  That's more of a future potential optimization than anything that needs to be changed at this point though.
  2. Yeah, I was going to mention that for #1, but you beat me to it.
    
    For #2, the only reason getDecoderPtr is called *now* is to call decode, but the intention is that in the future ISA code could react to state changes that change mode and reconfigure the decoder. That means there still needs to be a getDecoderPtr, and while a specific decode function would be cleaner in a lot of cases, because we have the context code copied over and over, a redundant function adds a lot of extra code without being truly necessary. I'd like to see us get rid of all the at least redundant feeling interface objects, but in the mean time...
  3. Yea, I didn't mean that getDecoderPtr() could be eliminated entirely, and I realize it will be used for other things in the future.  (Have you thought about how this will work, given that the interface for these decoder updates will likely depend on the ISA?  Or do you think you can make an ISA-independent interface?)
    
    Looking at the patch again, there aren't as many explicit calls to getDecoderPtr()->decode() as I thought, so it's not that big a deal.