ruby: patch checkpoint restore with garnet
Review Request #1829 - Created April 19, 2013 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 9670:1179901b1ddc --------------------------- ruby: patch checkpoint restore with garnet Due to recent changes to clocking system in Ruby and the way Ruby restores state from a checkpoint, garnet was failing to run from a checkpointed state. The problem is that Ruby resets the time to zero while warming up the caches. If any component records a local copy of the time (read calls curCycle()) before the simulation has started, then that component will not operate until that time is reached. In the context of this particular patch, the Garnet Network class calls curCycle() at multiple places. Any non-operational component can block in requests in the memory system, which the system interprets as a deadlock. This patch makes changes so that Garnet can successfully run from checkpointed state. It adds a globally visible time at which the actual execution started. This time is initialized in RubySystem::startup() function. This variable is only meant for components with in Ruby. This replaces the private variable that was maintained within Garnet since it is not possible to figure out the correct time when the value of this variable can be set. The patch also does away with all cases where curCycle() is called with in some Ruby component before the system has actually started executing. This is required due to the quirky manner in which ruby restores from a checkpoint.
Posted (April 19, 2013, 9:43 p.m.)
It's not a part of the codebase that I'm very familiar with, so excuse me if I'm being unfair here, but I would argue it needs a bit more comments. It is not an "intuitive" solution and it would be good to have some of the thought process baked into the code. For the rest, if it solves the problem... :-)
Review request changed
Updated (April 20, 2013, 4:16 a.m.)
Review request changed
Updated (April 20, 2013, 4:16 a.m.)
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+25 -50)
|
Posted (April 22, 2013, 3:26 a.m.)
Thanks, Nilay! With the "flexible" pipeline model I have successfully run one benchmark to completion.
However, there still appear to be problems with the "fixed" pipeline:
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Apr 22 2013 16:30:35
gem5 started Apr 22 2013 18:10:47
gem5 executing on arch
command line: ./build/X86_MESI_CMP_directory/gem5.opt configs/example/ruby_fs.py -n 32 --cpu-type timing --mem-size 2048MB --kernel system/x86_64-vmlinux-2.6.32.60.smp --l1d_size 32kB --l1d_assoc 4 --l1i_size 32kB --l1i_assoc 4 --l2_size 1024kB --l2_assoc 16 --num-l3caches 0 --num-l2caches 32 --num-dirs 32 --topology Mesh --mesh-rows 4 --checkpoint-dir new-cpt/output/MOESI_hammer,32timing-stamp,genome/checkpoints -r 0 --restore-with-cpu timing --garnet-network fixed
warn: add_child('terminal'): child 'terminal' already has parent
warn: add_child('cls'): child 'credit_links0 credit_links1' already has parent
<line above repeated 148 times>
Global frequency set at 1000000000000 ticks per second
info: kernel located at: /home/marco/gem5/system/binaries/system/x86_64-vmlinux-2.6.32.60.smp
0: rtc: Real-time clock set to Sun Jan 1 00:00:00 2012
Listening for com_1 connection on port 3457
warn: Reading current count from inactive timer.
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
info: Entering event queue @ 5512909158000. Starting simulation...
panic: Possible Deadlock detected. Aborting!
version: 2 request.paddr: 0x[0x7e623000, line 0x7e623000] m_readRequestTable: 1 current time: 5513159158000 issue_time: 5512909158000 difference: 250000000
@ cycle 2756260
[wakeup:build/X86_MESI_CMP_directory/mem/ruby/system/Sequencer.cc, line 107]
Memory Usage: 2756260 KBytes
Program aborted at cycle 5513159158000
Note that the point at which gem5 re-enters the event queue is the same as the issue time of the request which triggered the deadlock (5512909158000).
Review request changed
Updated (April 22, 2013, 6:15 a.m.)
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+44 -50)
|
