ISA Parser: Allow predication of source and destination registers
Review Request #1153 - Created April 16, 2012 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 9173:ae0ade8173ef --------------------------- ISA Parser: Allow predication of source and destination registers This patch is meant for allowing predicated reads and writes. Note that this predication is different from the ISA provided predication. They way we currently provide the ISA description for X86, we read/write registers that do not need to be actually read/written. This is likely to be true for other ISAs as well. This patch allows for read and write predicates to be associated with operands. It allows for the register indices for source and destination registers to be decided at the time when the microop is constructed. The run time indicies come in to play only when the at least one of the predicates has been provided. This patch will not affect any of the ISAs that do not provide these predicates. Also the patch assumes that the order in which operands appear in any function of the microop is same across all the functions of the microops. A subsequent patch will enable predication for the x86 ISA.
Issue Summary
| Description | From | Last Updated | Status |
|---|---|---|---|
| To be consistent, the source and destination registers should be be treated this way. | Gabe Black | April 24, 2012, 5:37 p.m. | Open |
| Instead of making a map, just put all the indexes here directly. It'll take the same amount of space and ... | Gabe Black | April 24, 2012, 5:37 p.m. | Open |
| The fact that we need to add this concerns me a lot. We're adding about a cache line of size ... | Ali Saidi | April 25, 2012, 12:54 a.m. | Open |
| This could be written better as val += [None] * 4 | Steve Reinhardt | Sept. 10, 2012, 2:20 a.m. | Open |
Description: |
|
||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+17 -18) |
Description: |
|
||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+22 -14) |
-
src/arch/isa_parser.py (Diff revision 3) -
To be consistent, the source and destination registers should be be treated this way.
-
src/cpu/static_inst.hh (Diff revision 3) -
Instead of making a map, just put all the indexes here directly. It'll take the same amount of space and save a level of indirection.
I believe I understand why this is necessary,
-
src/cpu/static_inst.hh (Diff revision 3) -
The fact that we need to add this concerns me a lot. We're adding about a cache line of size to each static instruction here. That seems like an unreasonable amount of storage. Really all these need to be dynamically allocated instead of the worst case maximums.
Status: Re-opened
Summary: |
|
|---|
Description: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+229 -77) |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+232 -77) |
Any comments on this patch?
Overall, I think that trying to keep both the old model and the new model is overly complicating things. I think it's fine if all the constructors have the form: _numSrcRegs = 0; _srcRegIdx[_numSrcRegs++] = <X>; _srcRegIdx[_numSrcRegs++] = <Y>; with optional "if" clauses around the conditional registers. If there are no if clauses, then compiling with optimization should generate the same code as the old version where all the indices and values are hard coded (the compiler can do all the increments at compile time... it's just constant propagation). The python is complex enough as it is. My other concern is that all this only works if the predicate expressions evaluate the same at construction time as at execution time. I'm not sure how to enforce that, or if it's practical to do so. I can believe that that's not a problem for our current usage of this feature, but I can see that being a very subtle source of bugs in the future.
-
src/arch/isa_parser.py (Diff revision 5) -
Why not move the check above the padding and assignment, and then you won't have to compensate for the padding values?
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+188 -73) |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 7 (+187 -72) |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 8 (+188 -72) |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 9 (+188 -72) |
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 10 (+188 -72) |
Any comments on this? I had also uploaded, for review, the patch that makes use of predication -- http://reviews.gem5.org/r/1322/
Sorry for the delay on this one, Nilay... it looks good. I'm still not thrilled with the amount of duplication in the makeRead() and makeWrite() methods though. I'm fine with having all the instructions use _numSrcRegs++ / _numDestRegs++ as the indices, similar to the way This would allow you to get rid of the predRead and predWrite flags entirely in addition to simplifying the code in makeRead() and makeWrite(). There could be a small performance penalty in debug mode (then again, there might not be or it might be negligible), but I'd be very disappointed if the compiler didn't optimize the difference away entirely for opt or fast. To me that's a reasonable tradeoff. If you really don't want to go there, then one thing you could do that would clean up the code a bit would be to set src_reg_idx to '_sourceIndex++' and dst_reg_idx to '_destIndex++' for operands that are in predicated instructions. Then in makeRead() and makeWrite() you could unconditionally do things like this: int_reg_val = 'xc->readIntRegOperand(this, %s)' % self.src_reg_idx Note that in python '%s' and '%d' do the same thing if the corresponding variable is an integer. This would at least get rid of a lot of the redundancy in the generated code strings, which is especially bad in makeWrite(). It comes at the cost of complicating OperandList.__init__() a bit, since you'd have to make an initial pass to see if predRead or predWrite would be set before being able to set src_reg_idx/dst_reg_idx properly (or you could use a second pass to override src_reg_idx/dst_reg_idx if predRead/predWrite are set). Does that make sense? I'd prefer the former idea (just making everything uniform), but I threw the second one out there as a compromise to be able to generate the same output you're generating now only (IMO) just a little more cleanly.
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 11 (+188 -70) |
Hi Nilay, this is much better, thanks. I did notice one minor thing below, but feel free to commit after fixing that (no need to repost for further review on my account).
-
src/arch/isa_parser.py (Diff revision 11) -
This could be written better as val += [None] * 4
