LSQ: Only trigger a memory violation with a load/load if the value changes.
Review Request #823 - Created Aug. 9, 2011 and submitted
| Information | |
|---|---|
| Ali Saidi | |
| gem5 | |
| Reviewers | |
| Default | |
| ali, gblack, nate, stever | |
LSQ: Only trigger a memory violation with a load/load if the value changes. Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dependency checking). The performance improvement on SPEC benchmarks can be substantial (2-10%).
This patch has been tested with a couple of self-checking hand crafted programs I wrote to stress ordering between two cores. With injected bugs it definitely shows a failure and it doesn't with this implementation (and it caught one issue during development).
Posted (Aug. 31, 2011, 8:31 p.m.)
The way you set up the ReExec fault definitely needs to be fixed before this goes in, but as far as what it's doing it seems basically ok. I don't follow all the details of how you're checking for ordering violations and whether that's right, so I'll leave that to somebody else to check.
-
src/arch/alpha/faults.hh (Diff revision 1) -
For artificial faults, go ahead and either create a fault in the base files in sim, or perhaps in a file in the O3 directory. The later would probably be better since it doesn't leak O3-ey bits into other parts of the simulator. It can inherit from the base fault classes without involving the ISA at all. You don't need a per-ISA function to generate them since they aren't ISA specific. They're really an implementation detail. In a more generic location it would need a more descriptive/specific/collision resistant name.
-
src/arch/alpha/faults.hh (Diff revision 1) -
Excessive spacing.
-
src/arch/mips/faults.cc (Diff revision 1) -
No need for the intermediate variable. Also, I'm not sure what, but you might want to poke some other state since you'll potentially be copying around a structure with a few elements. The trick would be finding something that's safe to poke for all ISAs.
-
src/arch/power/faults.cc (Diff revision 1) -
I don't think you need three of these...
-
src/arch/x86/faults.hh (Diff revision 1) -
Inconsistent indentation. This applies elsewhere as well.
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
Not your fault, but else should follow the }
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
How do you expect/did this will affect O3 simulator performance?
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
This is going to create lots of output if this loop goes around a lot.
-
src/cpu/o3/lsq_unit.hh (Diff revision 1) -
is_load can be determined from inst in checkViolations, can't it? Do we need another parameter?
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
Random whitespace change. Also we probably don't want a straggling blank line at the end of this function.
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
Could the port do this when setPeer is called?
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
spaces around the -
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
This sounds like two sentences squished together.
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
genMachineCheckFault should be the "Oh crap, oh crap, oh crap" fault, not part of normal execution. Ideally it should be eliminated all together, but it definitely shouldn't be built into things like that.
Posted (Sept. 1, 2011, 4:54 a.m.)
-
src/arch/alpha/faults.hh (Diff revision 1) -
I initially tried the fault in sim approach, however that doesn't work because the pcstate object is resolved by the namespace and you can't do any template magic to make that happen. There needs to be seperate classes each defined in a namespace FooISA{ ... } or at least using namespace FooISA. While it could be done by doing using namespace TheISA that gets us further away from being able to support a single binary with multiple ISAs, so I don't like it nearly as much. As for the fault, it's a generic fault that could apply to all the CPU models (inorder, out-of-order, any other model) so I think it's more generic than just the o3 even though it's only used here. The fault could also be used in the future to get rid of some of the more specific squashDueToFoo instructions and just use more generic faults to do the job for you. -
src/arch/alpha/faults.hh (Diff revision 1) -
fixed
-
src/arch/mips/faults.cc (Diff revision 1) -
It's not necessary, but I personally like the style better. Do you have an issue with it (the compiler should optimized it away). I can't grok the second half of your statement.
-
src/arch/power/faults.cc (Diff revision 1) -
dunno how that happened
-
src/arch/x86/faults.hh (Diff revision 1) -
This is the correct indentation per the style guide
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
fixed
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
Nothing that I can perceive
-
src/cpu/o3/lsq_impl.hh (Diff revision 1) -
moved out of loop, although it's only called once per thread, so it shouldn't be a big deal
-
src/cpu/o3/lsq_unit.hh (Diff revision 1) -
removed and replaced with inst->isLoad()
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
fixed
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
There is a potential case where setPeer() is called, but the port doesn't know it's size until the entire memory system is setup (beacuse the peer doesn't know it's size directly and needs to query a peer object that might not yet be connected. This is a more robust implementation.
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
fixed
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
fixed
-
src/cpu/o3/lsq_unit_impl.hh (Diff revision 1) -
It was used before and it is an oh crap... If we actually commit this instruction we've done something extremely wrong. This is the same way that it was handled previously.
Posted (Sept. 9, 2011, 7:17 p.m.)
This looks good to me. I don't know the LSQ all that well or how your change affects it other than what you have in your description, but I'm willing to assume you got that right. Thanks for moving the fault to sim. I notice you have another fault in ARM which flushes the pipe and also seems fairly generic (modulo the writes == flush assumption). It would be nice if in a future change you moved that to sim too, but leaving it where it is also makes some sense for various reasons.
