cpu, x86: Allow the TLB to be warmed up before CPU switch

Information
Submitter:	Jason Lowe-Power
Repository:	gem5
Branch:	default
Bugs:
Depends On:
Reviewers
Groups:	Default
People:

Description

Changeset 11482:51dec612f11b
---------------------------
cpu, x86: Allow the TLB to be warmed up before CPU switch

Previously, before a CPU was switched out, the TLB was always flushed
Now, we first call takeOverFrom with the TLB. We only flush
the TLB right before the CPU is switched in.
This changeset also contains the needed code for x86 to
takeOverFrom with the TLB, similar changes may be needed for the ARM
architecture.
With this changeset, when you switch from atomic to timing mode the
TLB is warm.

Testing Done

Issue Summary

Description	From	Last Updated	Status
If this just (essentially) copy constructs a new TLB from the old, why can't the same TLB be used .. ...	Curtis Dunham	May 26, 2016, 7:43 a.m.	Dropped

src/arch/x86/tlb.cc (Diff revision 1)

If this just (essentially) copy constructs a new TLB from the old, why can't the same TLB be used .. just reconnecting the SimObject graph? Presumably so some form of generality could in theory be applied, or this was more expedient?...

Show all issues

Jason Lowe-Power May 26, 2016, 6:26 a.m. (May 26, 2016, 6:26 a.m.)

I think I see what you're saying, but I'm not sure how it would be done with the current "takeOverFrom" API. Could you describe in more detail what you're envisioning?

From what I understand, when switching CPUs, each object's "takeOverFrom" method is called with a pointer to the old version of the object. Do you think this should be changed?

Curtis Dunham May 26, 2016, 7:05 a.m. (May 26, 2016, 7:05 a.m.)

I'm not sure it should be changed, just calling it into question in this case -- because what is this function doing that wouldn't have the same result as just using the same TLB instance?  I suspect it comes down to the way the object graph is constructed, as a matter of gem5 philosophy, and how that has been integrated with the "switcheroo" functionality.  I'm  not sure it's worth opening that can of worms, unless we have a better solution.  One thing to consider though: generally speaking, would we want to do such a copying when the std::type_info (return value of typeid(T))  for the two objects is the same? Different hacker-speak terminology at the Python/SWIG level, but same argument.

Jason Lowe-Power May 26, 2016, 7:17 a.m. (May 26, 2016, 7:17 a.m.)

I guess it's possible that you would have a different TLB model for your detailed simulation vs. warmup. That would be a reason to use the "takeOverFrom" method instead of just switching the pointers or doing an automatic deep copy via type_info (if that's what you're proposing). Though that clearly isn't the case currently.

Can we drop this issue for now?

Curtis Dunham May 26, 2016, 7:43 a.m. (May 26, 2016, 7:43 a.m.)
```
Dropped, thanks for the discussion.
```

Steve Reinhardt May 26, 2016, 5:38 p.m. (May 26, 2016, 5:38 p.m.)

Actually I came to raise the same issue---instead of copying the state, why doesn't the CPU's takeOverFrom() method just copy the TLB pointers? This is basically what we do with caches, IIRC. While in theory you could have different TLB configs between warmup and simulation, to be honest I don't see a situation where that makes sense (just as is the case for caches).

Jason Lowe-Power May 27, 2016, 6:53 a.m. (May 27, 2016, 6:53 a.m.)

I agree with your point, Steve. But I still am not seeing what the "right" thing to do here is. Currently each CPU (e.g., atomic and detailed) has its own itb/dtb that is instantiated in BaseCPU.py (not the config scripts). These pointers are passed all the way down to the thread contexts in each CPU, which also are instantiated for each CPU (e.g., atomic and detailed).

I think what would be needed to just "switch the pointers" would be to change the way TLBs are instantiated to make them like the caches with one TLB per (actual) CPU per system instead of one TLB per actual CPU per type (e.g., atomic and detailed). This seems quite invasive. I would really rather not have to make a deep change to how the TLBs are hooked up to the CPUs/threadContexts if I can avoid it. Does this make sense? Do you see a simpler way to update this code?

I'll admit my example of differnt TLBs for warmup and detailed may have been a bit contrived ;). However, we do similar things with Ruby for the caches, sometimes.

Steve Reinhardt May 28, 2016, 10:06 a.m. (May 28, 2016, 10:06 a.m.)

I think we're all agreeing that the "ideal" solution (disregarding effort and disruption costs) would be if the config scripts instantiated TLBs the way they did caches, so that it would be straightforward to give different CPUs the same TLB objects. I haven't looked at this very closely, but I don't dispute that it's likely a big invasive change. That said, if we all agree that it's the ideal end state, we shouldn't be too unwilling to just go in and do it. Philosophically, I think one thing that makes code bases crufty is when people are unwilling (or just don't have the time) to make big changes in order to do the "right thing", leaving a bunch of code in a suboptimal state because that was the path of least resistance.

Of course, sometimes you don't have time, and one thing that makes code bases too rigid and eventually useless is when people insist on only doing the optimal thing and are unwilling to accept changes that work and add value but aren't aesthetically pleasing :). So in that spirit, the option I was originally thinking of was that takeOverFrom() could literally overwrite the TLB pointers of the destination CPU rather than copying the state. Thus if you had a sampling pattern that went ABABAB, the TLBs that got originally instantiated with B would never be used, and every switch after the first would end up overwriting the TLB pointers with the value they already had (you could check this and skip the write, but the check is probably more expensive than the write). Your point about "pointers are passed all the way down to the thread contexts" makes me wonder how easy this would be, though, if the pointers are indeed copied and need to be updated in multiple locations.

So another possibility comes to mind, which is basically to do the same overwriting, but do it in python between the point where the CPUs are set up and when m5.instantiate() is called. So you let the config script do its normal thing, but then at the last moment before instantiation you do:
B.itb = A.itb
B.dtb = A.dtb
and then the TLBs configured for B never even get instantiated, and A and B share the same TLBs from the get-go, without major changes to the config scripts. Would that work?

Looks reasonable, thanks.

You have a pending review.

Review Board 2.0.15

Screenshots

Files

Issue Summary