ruby: Fix checkpointing and restore
Review Request #2908 - Created June 23, 2015 and submitted
| Information | |
|---|---|
| Timothy Jones | |
| gem5 | |
| default | |
| Reviewers | |
| Default, Ruby | |
ruby: Fix checkpointing and restore
There are 2 problems with the existing checkpoint and restore code in ruby.
The first is that when the event queue is altered by ruby during serialization,
some events that are currently scheduled cannot be found (e.g. the event to
stop simulation that always lives on the queue), causing a panic. The second
is that ruby is sometimes serialized after the memory system, meaning that the
dirty data in its cache is flushed back to memory too late and so isn't
included in the checkpoint.These are fixed by implementing memory writeback in ruby, using the same
technique of hijacking the event queue, but first descheduling all events that
are currently on it. They are saved, along with their scheduled time, so that
the event queue can be faithfully reconstructed after writeback has finished.
Writeback is still implemented using flushing, so the cache recorder object,
that is created to generate the trace and manage flushing, is kept around and
used during serialization to write the trace to disk.
Status: Re-opened
Summary: |
|
||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
|
||||||||||||||||||
Diff: |
Revision 1 (+69 -5) |
-
src/mem/ruby/system/System.cc (Diff revision 1) -
I am not a fan. This should be solved by the draining logic. Once the object claims to be drained, the order of serialisation should not matter.
I do not think this is the way to go. There is already an established methodology to solve the issue.
Change Summary:
Moved code to flush ruby data back to memory into new memWriteback() method,
which removes the need for serialization priorities.
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+114 -65) |
Thanks for getting this in shape Tim! Much nicer.
Looks good to me. Thanks for tackling this!
What did you use to test this? It would be wonderful if we had a regression script for Ruby and checkpointing! No need to hold up this bugfix patch for it, but if you could cobble a test together quickly it would be nice to have. If not, let me know how you are testing this patch and I might put something together.
Ship It!
Hi Tim,
Sorry to come back to this patch, but I just applied it and tried to test it and ran into a problem. When restoring the original event queue in line 187 of System.cc, I get an error that the event is already on the event queue. Below is how I ran into the problem:
scons build/X86_MOESI_hammer/gem5.opt -j5 --default=X86 PROTOCOL=MOESI_hammer
build/X86_MOESI_hammer/gem5.opt configs/example/fs.py --ruby --cpu-type=detailed -m 4118117000 --checkpoint-at-endAm I missing some other patch that is also needed in conjuction with this one?
