o3: Fix occupancy checks for SMT
Review Request #2181 - Created March 4, 2014 and submitted - Latest diff uploaded
| Information | |
|---|---|
| Faissal Sleiman | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 10101:1cb43a0d4ec6 --------------------------- o3: Fix occupancy checks for SMT A number of calls to isEmpty() and numFreeEntries() should be thread-specific. In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList. In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of. In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it). In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient.
quick Alpha debug regression, SMT test's stats change as expected.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/linux/inorder-timing passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/linux/o3-timing passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/linux/simple-atomic passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/linux/simple-timing passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/linux/simple-timing-ruby passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/tru64/o3-timing passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/tru64/simple-atomic passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/tru64/simple-timing passed.
***** build/ALPHA/tests/debug/quick/se/00.hello/alpha/tru64/simple-timing-ruby passed.
***** build/ALPHA/tests/debug/quick/se/01.hello-2T-smt/alpha/linux/o3-timing CHANGED!
***** build/ALPHA/tests/debug/quick/se/20.eio-short/alpha/eio/simple-atomic skipped.
***** build/ALPHA/tests/debug/quick/se/20.eio-short/alpha/eio/simple-timing skipped.
***** build/ALPHA/tests/debug/quick/se/30.eio-mp/alpha/eio/simple-atomic-mp skipped.
***** build/ALPHA/tests/debug/quick/se/30.eio-mp/alpha/eio/simple-timing-mp skipped.
***** build/ALPHA/tests/debug/quick/se/50.memtest/alpha/linux/memtest-ruby passed.
***** build/ALPHA/tests/debug/quick/se/60.rubytest/alpha/linux/rubytest-ruby passed.
===== Statistics differences =====
Maximum error magnitude: +9999.000000%
Reference New Value Abs Diff Pct Chg
Key statistics:
host_inst_rate 46987 9998 -36989 -78.72%
host_mem_usage 231368 228376 -2992 -1.29%
sim_insts 12745 12745 0 +0.00%
sim_ops 12745 12745 0 +0.00%
sim_ticks 24229500 24353500 124000 +0.51%
system.cpu.commit.committedInsts::0 6390 6390 0 +0.00%
system.cpu.commit.committedInsts::1 6389 6389 0 +0.00%
system.cpu.commit.committedInsts::total 12779 12779 0 +0.00%
system.cpu.commit.committedOps::0 6390 6390 0 +0.00%
system.cpu.commit.committedOps::1 6389 6389 0 +0.00%
system.cpu.commit.committedOps::total 12779 12779 0 +0.00%
system.cpu.committedInsts::0 6373 6373 0 +0.00%
system.cpu.committedInsts::1 6372 6372 0 +0.00%
system.cpu.committedInsts_total 12745 12745 0 +0.00%
system.cpu.committedOps::0 6373 6373 0 +0.00%
system.cpu.committedOps::1 6372 6372 0 +0.00%
system.cpu.ipc::0 0.131511 0.130841 -0.000670 -0.51%
system.cpu.ipc::1 0.131490 0.130820 -0.000670 -0.51%
system.cpu.ipc_total 0.263000 0.261661 -0.001339 -0.51%
Differences > 0%:
system.cpu.iew.iewLSQFullEvents 0 2 2 +9999.00%
system.cpu.iew.iewIQFullEvents 23 4 -19 -82.61%
system.cpu.iew.lsq.thread1.ignoredResponses 2 3 1 +50.00%
system.physmem.bytesPerActivate::832 2.000 1.000 -1.000 -50.00%
system.cpu.iew.iewBlockCycles 2954 1837 -1117 -37.81%
system.cpu.iew.lsq.thread1.forwLoads 47 64 17 +36.17%
system.cpu.memDep0.conflictingLoads 9 6 -3 -33.33%
system.physmem.bytesPerActivate::320 3.000 2.000 -1.000 -33.33%
system.physmem.bytesPerActivate::448 3.000 4.000 1.000 +33.33%
system.physmem.bytesPerActivate::960 3.000 4.000 1.000 +33.33%
system.cpu.iq.issued_per_cycle::8 22.000 15.000 -7.000 -31.82%
system.cpu.rename.ROBFullEvents 54 71 17 +31.48%
system.physmem.bytesPerActivate::576 4.000 5.000 1.000 +25.00%
system.physmem.bytesPerActivate::640 4.000 5.000 1.000 +25.00%
system.physmem.bytesPerActivate::704 5.000 6.000 1.000 +20.00%
system.physmem.rdQLenPdf::4 15 12 -3 -20.00%
system.cpu.iq.iqSquashedInstsIssued 131 105 -26 -19.85%
system.cpu.rename.serializeStallCycles 1585 1276 -309 -19.50%
system.cpu.rename.BlockCycles 6164 5248 -916 -14.86%
system.cpu.iew.iewUnblockCycles 42 36 -6 -14.29%
[... showing top 20 errors only, additional errors omitted ...]
Missing 2 reference statistics:
system.physmem.bytesPerActivate::1536 2 0.92% 98.16%
system.physmem.bytesPerActivate::896 2 0.92% 94.47%
Found 2 new statistics:
system.cpu.rename.IQFullEvents 39
system.physmem.bytesPerActivate::1664 2 0.94% 98.59%
