Review Board 2.0.15


O3: Enhance data address translation by supporting hardware page table walkers.

Review Request #422 - Created Jan. 12, 2011 and submitted

Information
Ali Saidi
gem5
Reviewers
Default
ali, gblack, nate, stever
O3: Enhance data address translation by supporting hardware page table walkers.

Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.

   
Posted (Jan. 16, 2011, 4:28 p.m.)
This change seems to have some dead functionality in it. The "delay" member added to translations is never used, the "delay()" pure virtual function is never used and not defined for any other ISAs (which I think will break them all), and the translationDelayed variable is never used. Unnecessary copyright changes should also be rolled back when removing that dead code.

Are there any instructions that actually expect to get a valid fault when performing initiateAcc? I could imagine there might be since it used to be something you could always do for the most part, but getting rid of those instances would help simplify this code I think. The code that prepares a request for the memory system could follow from the finishTranslation function no matter if it happened immediately or after a table walk. Then an instruction would (hopefully) only require one pass since it's work would be done and it would either be ready to commit (a store) or ready for completeAcc (a load). The actual load/store queue would have to wait since the address might not be ready, but that might be a pretty simple extension on top of waiting for initiateAcc to happen.
  1. I pointed out where delay() is used and it's not defined in BaseTLB, it's in Translation. All the regressions pass just fine. If you search for translationDelayed in the diff you'll see it is used too.
    
    Yes, Alpha still returns a fault on initiateAcc() since there is no table walk, the tlb lookup is known immediately. 
    
    This is quite a substantial patch, so I would prefer to commit it close to as is and then work on patch on top of it to rework finishTranslation if it's possible.
  2. Ok, yeah, I see where translationDelayed is used. I see where delay() is called, but I don't see where the value it sets (member variable delay) is actually used for anything. There is a lot of code here, so I might have missed it. My biggest concern about this patch is that it's adding a lot of code and state (relatively speaking) to handle the delayed translation case, and O3 is already really complicated. That may just be what's necessary, though. It would be nice to go into Alpha and adjust its semantics as far as expecting faults in initiateAcc. When I get a chance I'll look at them to see how easy that might be. I'm not sure if or when that would get done and taking advantage of it would require reworking a decent bit of this code, so it's not necessarily something you should wait for.
    
    This is also at least the second time where I've been confused by nested classes or multiple classes in a file and thought I was looking at something I wasn't. I need to be more careful about that.
src/sim/tlb.hh (Diff revision 1)
 
 
Did you compile an ISAs other than ARM? I don't see this being defined for any of them, and since it's pure virtual I don't think they'll work any more.
  1. They will.. It's not the BaseTLB, it's translation which the timing cpus inherit, not the isas.
    
Posted (Jan. 27, 2011, 7:11 a.m.)
Any further comments on this? It doesn't alter the regressions at all, so I would like to commit it, even if we change initiateAcc() to not return a fault in the future.

Ali
  1. As long as it's not too messy to take the two pass mechanism back out, doing that later is ok with me.
src/cpu/simple/timing.hh (Diff revision 1)
 
 
I'm pretty sure I'm going to use this for my dtlb followed by itlb walk issue in the timing simple cpu.
src/cpu/translation.hh (Diff revision 1)
 
 
It's used right here.
  1. That's where it's set, but is there anywhere it's current value is being consumed?
src/sim/tlb.hh (Diff revision 1)
 
 
All the regressions still pass... This isn't the base tlb object, it's the translation object so yes, it's been added to all the cpu models.
Posted (Jan. 28, 2011, 12:14 a.m.)



  
src/arch/arm/tlb.cc (Diff revision 1)
 
 
I think markDelayed() would be a much better name than delay() for this function, since the function itself doesn't do or cause any delaying.
src/cpu/o3/iew_impl.hh (Diff revision 1)
 
 
What's the difference between (Started && !Completed) and Delayed?  It seems like there's only a very small window of time where the former doesn't imply the latter; if you're explicitly doing something in that window where the difference matters, it would be good to comment that.  If you're not, then we probably don't need three different variables.
src/cpu/o3/inst_queue.hh (Diff revision 1)
 
 
typo in comment ("deferred" not "referred")