Review Board 2.0.15


O3: Fix itstate prediction and recovery.

Review Request #421 - Created Jan. 12, 2011 and submitted

Information
Ali Saidi
gem5
Reviewers
Default
ali, gblack, nate, stever
O3: Fix itstate prediction and recovery.

Any change of control flow now resets the itstate to 0 mask and 0 condition,
except where the control flow alteration write into the cpsr register. These
case, for example return from an iterrupt, require the predecoder to recover
the itstate.

As there is a window of opportunity between the return from an interrupt
changing the control flow at the head of the pipe and the commit of the update
to the CPSR, the predecoder needs to be able to grab the ITstate early. This
is now handled by setting the forcedItState inside a PCstate for the control
flow altering instruction.

That instruction will have the correct mask/cond, but will not have a valid
itstate until advancePC is called (note this happens to advance the execution).
When the new PCstate is copy constructed it gets the itstate cond/mask, and
upon advancing the PC the itstate becomes valid.

Subsequent advancing invalidates the state and zeroes the cond/mask. This is
handled in isolation for the ARM ISA and should have no impact on other ISAs.

Refer arch/arm/types.hh and arch/arm/predecoder.cc for the details.

   
Posted (Jan. 16, 2011, 4:44 p.m.)
It's great that this is contained within ARM and won't affect other ISAs. How does it affect performance for ARM? Since this is frequently executed code it would be a good idea to quantify what impact there is and also see if it can be minimized.

For instance, in the instructions that change itstate, why not actually have a nextItstate variable which will get moved into itstate on update? That sounds sort of like what you have, except you could make nextItstate always the nextItstate. If it's 0 it's inert, and if it isn't it gets advanced as per the rules. Then you don't need any special handling as far as whether it's valid, injecting new values, etc.

Also, control flow is supposed to be illegal inside or into an IT block, right? Why do you need to handle that case in a particular way? As long as the simulator doesn't melt down it should (I'm guessing) be ok to do whatever is easiest.
  1. There hasn't been any appreciable difference in performance. After we get the o3 cpu working with arm we're planning to do some profiling to see where it can be sped up. It is a bit slower than Alpha, but it's the same speed as SPARC. You're correct that flow control is illegal inside an IT block, but interrupts and faults are not. The mechanism exists to handle faults that occur in the middle of an IT block.
    
    
  2. Have you checked with something like the simple timing CPU? This code would still get used there, and there would be a lot less other stuff to mask the overhead. One way or the other since this is contained within ARM (which is a big plus) and is most likely functionally correct, we can do performance tuning and/or simplification later.
src/arch/arm/types.hh (Diff revision 1)
 
 
I'm pretty sure this line is too long.
  1. yup, will fix.
src/arch/arm/types.hh (Diff revision 1)
 
 
 
This one too.
src/arch/arm/types.hh (Diff revision 1)
 
 
Is there a way to get rid of this if by moving it to some code that isn't so frequently executed, or simplifying it so it doesn't have an else and will sometimes not have any work to do?