sim: simulate with multiple event queues
Review Request #1667 - Created Jan. 24, 2013 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 9885:ab020183761b --------------------------- sim: simulate with multiple event queues This patch extends the patch Steve posted on the reviewboard (846). The patch updated with all the changes that have taken place over last 15 months. Code has been added so as actually carry out a quantum-based parallel simulation. The patch was tested in two different configurations: 1. ruby_network_test.py: in this simulation L1 cache controllers receive requests from the cpu. The requests are replied to immediately without any communication taking place with any other level. 2. twosys-tsunami-simple-atomic: this configuration simulates a client-server system which are connected by an ethernet link. We still lack the ability to communicate using message buffers or ports. But other things like simulation start and end, synchronizing after every quantum seem to be working.
Issue Summary
| Description | From | Last Updated | Status |
|---|---|---|---|
| Is a list a good choice here? | Andreas Hansson | Aug. 20, 2013, 5:42 a.m. | Open |
| Nothing urgent, but is a list really the best choice here? | Andreas Hansson | Sept. 21, 2013, 8:38 a.m. | Open |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+915 -232) |
Hi Nilay,
Thanks a lot for posting this. I'm really excited to see you pushing this forward.
One high-level thought... now that I've looked at this code again after not having seen it for over a year, the repetition of "mainEventQueue" in getMainEventQueue(), curMainEventQueue(), etc. seems really verbose. It made some sense when there was one main event queue, but I'd be in favor of just dropping "Main" basically everywhere it appears. If we need to distinguish the PC-based event queues, we should do it by giving them longer and more distinguished names, and not burden the common case usage. I know this was my naming and not yours... sorry about the extra burden.
Also, I had added "Copyright (c) 2011 Advanced Micro Devices, Inc." to eventq.{cc,hh} and simulate.cc, and it looks like those got dropped somehow... please put them back.
-
src/sim/core.hh (Diff revision 2) -
This should eventually become a parameter on the Root SimObject, right?
-
src/sim/debug.cc (Diff revision 2) -
I'm not sure we need this warning... we are dumping all the event queues, right? (Except for the ones that aren't really event queues, like the PC-based ones.)
-
src/sim/eventq.hh (Diff revision 2) -
I would really like to solve this problem without adding yet another field to the Event structure. I think that all the common cases could be solved definitively just by appropriate selection of priorities. I'm surprised that less common cases (if any) don't sort themselves out naturally just because all the threads would be inserting the global events in the same program order.
-
src/sim/eventq.hh (Diff revision 2) -
Do we still need this? The Event flag doesn't seem to be referenced anywhere anymore, and I think the problem it originally addressed (having exit events on PC-based queues) could be addressed much more elegantly by marking the events on the PC-based queues as special anyway.
-
src/sim/eventq.hh (Diff revision 2) -
It would be nice to have this in the constructor, even if it means having a separate derived class for "real" event queues as opposed to PC-based queues.
-
src/sim/eventq.cc (Diff revision 2) -
I know this is my code and not yours, but looking at it now, I'm not sure why I didn't use a std::vector and get rid of the static limit. Does that make initialization too hard?
-
src/sim/global_event.cc (Diff revision 2) -
This curEventQueue() swapping around is a little funky... how do we know this code is not being called when the other threads are active? It looks like this is replacing my old "inParallelMode" flag... that was not necessarily the best solution either, but at least the intent was clearer, in my opinion. Was there something about that that wasn't working?
-
src/sim/simulate.cc (Diff revision 2) -
Is there a reason you're now creating all the threads every time simulate() is called, instead of just the first time (as in my previous code)? That seems like a lot of overhead if you're periodically going back to python to do things like dump stats.
-
src/sim/stat_control.cc (Diff revision 2) -
Are you adding simQuantum here to make sure that this is after the next sync? In the long run it's probably preferable to do this only when necessary, and print a warning when doing so.
Hi Nilay, Thanks for all of this, it's really great to see us making some progress here. A concern I have is with the use of __thread and how that effects clang whose static analysis tool has found quite a few bugs for us recently and OS X which doesn't support TLS (at least until something on OS X implements C++11 thread local support). So, I think we need to come up with some strategy to deal with this as quite a few people use OS x as their primary development machine for gem5 and I don't think we want to weld ourselves to gcc since llvm/clang does have some good points. Thanks, Ali
-
src/python/m5/SimObject.py (Diff revision 2) -
How does this work? I'm not saying it doesn't I just don't understand what the proxy is doing in this case? Looking for any object that is a UInt32? Do we want to have an specific EventQueueId object?
-
src/sim/eventq.cc (Diff revision 2) -
I'm a bit concerned with using __thread since it isn't supported on OS X as far as I know. I'd be fine for requiring TLS on a platform that we're using multi-threading, but if we're going to go down that route we need to ifdef it such that single threaded simulation doesn't use it. The other way would be to make _tls-mainEventQueue a #define and have a TLS or pthread_getspecfific()/setspecific() version.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+910 -243) |
Nice! Thanks for the changes, Nilay. This is really looking good. My comments are pretty minor.
-
src/sim/eventq.cc (Diff revisions 2 - 3) -
You can use csprintf() to generate a string directly, e.g.: string s = csprintf(sp, "MainEventQueue-%d", index); mainEventQueue.push_back(new EventQueue(s));
-
src/sim/eventq_impl.hh (Diff revisions 2 - 3) -
I would put a small comment here referring to the larger comment at BaseGlobalEvent::schedule() to explain why global events need to be put on the async queue even for the current thread.
-
src/sim/global_event.hh (Diff revisions 2 - 3) -
This comment needs to be updated.
-
src/sim/global_event.cc (Diff revisions 2 - 3) -
There should be a comment here explaining the situation about global events and why we need a lock and why they have to be scheduled on the async queue to guarantee a consistent global order to avoid deadlock.
-
src/sim/simulate.cc (Diff revisions 2 - 3) -
typo in comment
-
src/python/m5/simulate.py (Diff revision 3) -
I don't think we need two different variables that both have the same purpose of doing something only the first time simulate() is called... or is there something else going on here?
-
src/python/m5/simulate.py (Diff revision 3) -
Is there ever a case where we register an atexit callback and we want to make sure it's invoked even if the python script never calls simulate()? Practically speaking, I doubt it, but I thought I'd bring it up since clearly this change would break that assumption.
-
src/sim/Root.py (Diff revision 3) -
I'd prefer to make the default 0, which would make it clear that we shouldn't schedule a quantum, then require anyone using multiple threads to specify a quantum explicitly (and error out if they don't).
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+943 -242) |
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+945 -242) |
Hi Nilay, Could you see if you can easily #ifdef the __thread commands and the creation of threads such that if the platform doesn't support one or both features we can still compile and run? Thanks, Ali
-
src/sim/eventq.cc (Diff revision 5) -
I think this is broken... sp should be gone, and I think the only reason it compiles is because it thinks sp is your format string and that the format string is the first arg.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+966 -240) |
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 7 (+1002 -238) |
-
SConstruct (Diff revision 7) -
This needs to be killed after However,... as wer'e doing it now.
-
SConstruct (Diff revision 7) -
I'll push http://reviews.gem5.org/r/1970/ which takes care of this
-
src/SConscript (Diff revision 7) -
What is causing this to pop up?
-
src/python/m5/SimObject.py (Diff revision 7) -
Very minor, but could we move this to after the cxx_.* lines?
-
src/python/m5/simulate.py (Diff revision 7) -
Sporadic whitespace
-
src/python/m5/simulate.py (Diff revision 7) -
Could you add a line why this has moved here?
-
src/python/swig/core.i (Diff revision 7) -
Sporadic whitespace
-
src/python/swig/event.i (Diff revision 7) -
Perhaps I'm missing something here, but how come the Event is no longer an Event? :-)
-
src/python/swig/event.i (Diff revision 7) -
whitespace
-
src/sim/eventq.hh (Diff revision 7) -
Unrelated change?
-
src/sim/eventq.hh (Diff revision 7) -
All these unrelated?
-
src/sim/eventq.hh (Diff revision 7) -
Is a list a good choice here?
-
src/sim/sim_events.hh (Diff revision 7) -
const?
-
src/sim/sim_events.hh (Diff revision 7) -
const?
Looks great! Some minor questions and concerns.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 8 (+996 -234) |
-
src/sim/eventq.hh (Diff revision 8) -
Would it be better to call this curEventQueue? This is a pretty ugly name. (I know, probably my fault originally.)
-
src/sim/eventq.hh (Diff revision 8) -
This looks like an irrelevant whitespace-only change.
-
src/sim/eventq.hh (Diff revision 8) -
I realize (looking back) that this is my change, but it seems like it needs at least a comment. I'm also surprised that the second unserialize isn't marked virtual. I suppose the right answer ties in with our long-term discussion of checkpointing (see below), but in the short term it would be nice to at least have a comment here. Or is there one somewhere else that I missed?
-
src/sim/eventq.hh (Diff revision 8) -
Missing "event" in comment.
-
src/sim/serialize.cc (Diff revision 8) -
It seems odd that we're generating N identically named "MainEventQueue" sections. I expect that it works fine (since they're processed in order on both ends), but *if* we're going to stick with this model, then we should put indices on the names (perhaps except for queue 0, for backward compatibility). That said, we should also have a discussion about whether this is the right model at all. In the near term, we should probably stick with it because I don't want to hold things up, but in the long run, do we really want the number of threads to be baked into the checkpoint? I'd argue that checkpoints should be taken at a higher level such that the number of threads can be varied between a save and a restore.
-
src/sim/eventq.cc (Diff revision 8) -
I don't think this is needed anymore. at least not for os x...
-
src/sim/eventq.cc (Diff revision 8) -
same
-
src/base/barrier.hh (Diff revision 8) -
What version of gcc does this require? Will it work with 4.4? http://gcc.gnu.org/projects/cxx0x.html
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 9 (+983 -232) |
Looks good to me! Thanks so much, Nilay!
-
src/base/barrier.hh (Diff revision 9) -
Copyright?
-
src/sim/eventq.hh (Diff revision 9) -
Nothing urgent, but is a list really the best choice here?
-
src/sim/eventq.hh (Diff revision 9) -
initialize async_queue_mutex to NULL
-
src/sim/eventq.hh (Diff revision 9) -
assert async_queue_mutex == NULL
-
src/sim/eventq.cc (Diff revision 9) -
Is the notion of the quantum described somewhere?
-
src/sim/eventq.cc (Diff revision 9) -
Why not in the initializer list?
-
src/sim/eventq_impl.hh (Diff revision 9) -
is it worth checking inParallelMode first? As a side not, do we know what the performance impact is when not running in parallel mode? (perhaps this has been resolved already)
-
src/sim/global_event.hh (Diff revision 9) -
Been cooking for a long time? :-)
-
src/sim/global_event.hh (Diff revision 9) -
I thought we managed to rely entirely on C++11 and avoid pthreads?
-
src/sim/global_event.hh (Diff revision 9) -
please use || instead of "or"
Some minor comments. One thing I'd like to understand is how this affects the ongoing work to run gem5 together with SystemC (as an SC_THREAD in fact). Does this patch make the SC integration impossible? If so I think there are good reasons to discuss the trade-off in a lot more depth before pushing this.
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 10 (+1052 -236) |
-
src/sim/eventq.hh (Diff revision 9) -
Sorry, I might be missing something here, but in the constructor of EventQueue, you create a private EventQueue that lives in the local scope and is essentially a no-op. Right? And this allocates space? Perhaps it's some nifty pattern that I'm not familiar with, but I must confess I'm confused.
One additional issue:
Somehow the unproxying of the eventq_index does not work for the PC regressions due to some odd circular relationship. The change below is needed for it to work again.
diff --git a/src/dev/x86/Pc.py b/src/dev/x86/Pc.py
--- a/src/dev/x86/Pc.py
+++ b/src/dev/x86/Pc.py
@@ -57,10 +57,9 @@
behind_pci = IsaFake(pio_addr=x86IOAddress(0xcf8), pio_size=8)
# Serial port and terminal
- terminal = Terminal()
com_1 = Uart8250()
com_1.pio_addr = x86IOAddress(0x3f8)
- com_1.terminal = terminal
+ com_1.terminal = Terminal()
# Devices to catch access to non-existant serial ports.
fake_com_2 = IsaFake(pio_addr=x86IOAddress(0x2f8), pio_size=8)
