proto, probe: Add elastic trace probe to o3 cpu
Review Request #3027 - Created Aug. 11, 2015 and submitted
| Information | |
|---|---|
| Curtis Dunham | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
The elastic trace is a type of probe listener and listens to probe points
in multiple stages of the O3CPU. The notify method is called on a probe
point typically when an instruction successfully progresses through that
stage.As different listener methods mapped to the different probe points execute,
relevant information about the instruction, e.g. timestamps and register
accesses, are captured and stored in temporary InstExecInfo class objects.
When the instruction progresses through the commit stage, the timing and the
dependency information about the instruction is finalised and encapsulated in
a struct called TraceInfo. TraceInfo objects are collected in a list instead
of writing them out to the trace file one a time. This is required as the
trace is processed in chunks to evaluate order dependencies and computational
delay in case an instruction does not have any register dependencies. By this
we achieve a simpler algorithm during replay because every record in the
trace can be hooked onto a record in its past. The instruction dependency
trace is written out as a protobuf format file. A second trace containing
fetch requests at absolute timestamps is written to a separate protobuf
format file.If the instruction is not executed then it is not added to the trace.
The code checks if the instruction had a fault, if it predicated
false and thus previous register values were restored or if it was a
load/store that did not have a request (e.g. when the size of the
request is zero). In all these cases the instruction is set as
executed by the Execute stage and is picked up by the commit probe
listener. But a request is not issued and registers are not written.
So practically, skipping these should not hurt the dependency modelling.If squashing results in squashing younger instructions, it may happen that
the squash probe discards the inst and removes it from the temporary
store but execute stage deals with the instruction in the next cycle which
results in the execute probe seeing this inst as 'new' inst. A sequence
number of the last processed trace record is used to trap these cases and
not add to the temporary store.The elastic instruction trace and fetch request trace can be read in and
played back by the TraceCPU.
I am yet to read elastic_trace.cc.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
So ARM has a TraceCPU.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
flushTraces()?
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
Instead of short int, should this not be PhysRegIndex?
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
I think o3 is not required.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
I might go for vector instead of list. Typically list is not better than a vector since it is misses in the cache all the time.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
Cycles?
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
no o3.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
No o3, we are already in the o3 directory.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
- should be with the variable name.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
Reference to a pointer? Does that not look strange?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
nullptr
This is a very long review, but I think you would some performance improvement if you make the changes I have suggested.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 1) -
Use a vector here. Ultimately delete everything in one go instead of deleting them one by one as you do now.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
Why do we treat this listener differently from others?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
The function name should be changed since it is only dealing with instruction fetches.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
Why would it happen that we receive the execution tick after the instruction has been squashed?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
InstExecInfo *exec_info_ptr;
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
std::size_t instead of double?
Can we even do this? Were not statistical variables write-only?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
It is not clear why would this case ever occur.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
PhysRegIndex instead of short int.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
I would rather get an iterator from find() and then make this check to avoid having to lookup again.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
I don't think this check is required since std::set maintains unique elements.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
Why not misc registers?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
This is index type that you should use in other places as well.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
I am not sure why you use different types in different places. Again, were not statistical values supposed to be write-only?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
First get the iterator, and then perform the check. Then pass on the iterator to the erase function.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
This pattern of first getting the iterator and then processing, should be used in other places as well.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
<type> , not <type>
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
This just seems to be really arbitrary place to have this code. Can we have some event that happens when the numSimulatedInsts has reached a certain value?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
Get the iterator first.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
This check is not required since the for loop would not execute if the set was empty.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 1) -
Get the iterator first and use it for all the processing.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Brace on next line.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
short int --> PhysRegIndex
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Reference
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Brace on next line.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
The name tempStore sounds strange. Typically if something is temporary, you would want to declare it in the function that is going to make use of it.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Brace on next line.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Tick. A value of zero should mean no dependencies.
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Should the type be InstSeqNum?
-
src/cpu/o3/probe/elastic_trace.hh (Diff revision 2) -
Is there any use case where we may not want to write all the records in the trace?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 2) -
Will this throw an exception in case we cannot access the file?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 2) -
How about always enqueuing the event?
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 2) -
I think this variable is not of any use.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 2) -
I think the convention in gem5 is to write (!condition).
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
I think these checks are needless. Two reasons. First, since the python params do not have default values, the user would be forced to provide some value. Second, even if the user were to provide an empty value, the ProtoOutputStream constructor should fail.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
IIRC, we have talked about this before. I think there is no need for the "if {} " at all.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
Get this pointer from itr_temp_store.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
Since you need the mapped value later, follow the pattern you have been following in other parts of the code: call find and get the iterator first.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
Is this correct? Should not comp_delay = completionTick - executionTick . Secondly, the assert should be comparing the unsigned values as we discussed in another thread.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
Same problems as above.
-
src/cpu/o3/probe/elastic_trace.cc (Diff revision 3) -
Is the post decrement of any use?
Ship It!
