Review Board 2.0.15


ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.

Review Request #873 - Created Sept. 20, 2011 and submitted

Information
Gabe Black
gem5
Reviewers
Default
ali, gblack, nate, stever
ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.

By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.

Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.

   
Posted (Sept. 21, 2011, 12:50 a.m.)
Were there places where the ambiguity between a structure field access and the type notation were giving you problems?  Would they have been fixed just by making the one change here where the regex matches only valid type extensions and not any sequence of characters after a dot?

My reluctance here is that you're substituting one ambiguous overloaded syntax for another... it's unlikely that I have an identifier that by itself ends in "_uq" but it seems equally unlikely that I've got a structure field named "uq".  Between the two, IMO, the dot syntax is slightly more meaningful since the type qualifiers are a little bit like union fields.

It's fine to me to change it, but it seems like we should change it to something that's clearly less ambiguous; how about colon?  There's some potential ambiguity with ?:, but that's not a very common operator, and even when there's potential ambiguity it would be easy to fix by requiring spaces around the : when you don't want it to be treated as a type qualifier.

Some other possibilities are #, `, \, ~, and !.  I think only backtick is completely unused, but the others should be pretty unambiguous when embedded in an identifier.
  1. There was an instance of a sw field and an sw type which showed up even now that accessing members with . is so cumbersome. Really, though, I think using . for both is not only potentially ambiguous, it's also going to be really confusing. People won't be able to look at a bit of code, say Control.udw, and know for sure without looking something up if udw is a type or a member. Also the regular expressions in, say, the reg_or_imm sort of multiplexing instruction formats don't know what the type suffixes are. They just accept anything like they did before, but it happens to work out.
    
    Even though _ may at first seem like it would blend in too much, and/or be ambiguous, I think after looking at some examples it isn't really. Historically operand names have been capitalized, so putting a _ is a clear separator. It's also subtle enough to not obscure the name itself. I actually had a few ISAs converted over to using @ before, and it was just so ugly I gave up on it. You basically end up with this big wart in the middle of everything where . or _ stay out of the way.
    
    Lets say you have a register XMM in x86 which is a, say, 64 bit floating point register. Normally those are 128 bits, but I split them up into 64 bits. But in any case, lets say the default type is a double, but you want it to be a structure/union which is broken into lanes of various sizes. The type suffix for that could be lns, and the 2 byte lanes could be, for instance, wa, wb, wc, and wd. To store 0xaa55 into the third 2 byte word of the XMM register, it would look like this.
    
    XMM_lns.wc = 0xaa55;
    
    I think that's clear for a couple reasons. First, it's not uncommon to use a suffix to distinguish between different versions of the same thing with the same name, and that's essentially what _lns is doing. Second, "." means membership in C and C++ and python, etc., and here it does too. You're accessing the wc member, not essentially doing a cast (or really selecting a variant). If you had a control register and you saw.
    
    Control.udw = 1
    
    And you weren't already programmed to assume that was a type override, your initial assumption would likely be that you were assigning to the udw field of the control register. In some cases, that may be exactly what you want to do. If you have a (recently demoted) Twin64_t operand like, say, Mem after a load, you could get at its a and b members like this.
    
    Ra = Mem_tud.a;
    Rb = Mem_tud.b;
    
    which is also pretty clear.
    
    Another nice property of "_" compared to other punctuation, in addition to it being aesthetically nice, is that it can't be confused for an operator or break up a variable name. You could, theoretically, leave the code untouched and it would still be valid code. You can't for logistical reasons I mention in my commit message, but, otherwise it would work just fine. You could copy and paste the code into your own file, and it would just work. This also means the operand detector can be a little dumber because it doesn't have to worry about accidentally thinking Ra/Rb means Ra but instantiated as an Rb type operand.
    
    I thought about this quite a bit and it wasn't obvious that this would work out very well, but after some thought and experimentation I think it does.
  2. I still don't think the "." is confusing, if you think of unions rather than structs.  If you have a union of structs, then having a sequence of dotted elements is the obvious way to do it: XMM.lns.wc = 0xaa55.  You can argue that unions are confusing, but then you're arguing with K&R and not me ;-).  Or do you think that the type specifiers behave insufficiently like unions?
    
    Visually I think ":" looks pretty nice, but I can imagine that there are benefits in having something that's still a syntactically legal LHS expression (which either "." or "_" satisfies, but not much else).  OTOH, maybe having something that's not legal C is a plus in that it makes it obvious to the reader that there's something magic going on.
    
    Not trying to nitpick you on this one, but it seems like a pretty significant change so I want to feel confident that we're going the right direction here.
  3. Here are some examples from the SPARC isa description which partially illustrate the problem:
    
      1106                         RdLow = (Mem.tuw).a;
      1107                         RdHigh = (Mem.tuw).b;
    
    Here, we have to hide the fact that .a and .b go with Mem.tuw, because the parser would otherwise assume it was a type. Here's an example from x86 which would confuse a person instead of just the parser.
    
    CCFlagBits.of
    CCFlagBits.sf
    CCFlagBits.pf
    
    RFLAGS.sf
    RFLAGS.df
    
    Efer.ud
    Efer.sf
    
    DR6.b0
    DR6.bd
    
    Which of these are specifying a bitfield and which are specifying a type? You probably didn't get them all. Everything but Efer.ud and Efer.sf is a member.
    
    Or in ARM, what does SCTLR.sw mean? Treat the SCTLR control register as a signed word, or to access its swp/swpb enable field? What if the field got renamed to SCTLR.swp? Then SCTLR.sw would surprisingly and silently change meaning.
    
    I'm sure I can come up with more examples if necessary, but to me that seems sufficient.
    
    Another way to think about "_", and what led me to it initially, is that we're really just creating operands that are the cross product of the declared operands and the types, and then doing away with the type suffixes all together. This change has that semantic effect but keeps the mechanism basically the same.
  4. I see, it's kind of a type safety issue... a real union would know when these fields are valid, but our ad hoc mechanism doesn't, so that's where the ambiguity comes in.  That's a lot clearer now, thanks for the examples.
    
    Maybe someday if we ever put in a more robust C++ parser we can revisit this, but for now you've convinced me that this is pretty reasonable.
  5. Ok, great, thanks for your feedback.
Posted (Sept. 21, 2011, 9:12 a.m.)
I think I agree with Steve here. Either way after we've decided what the right thing to do is I'd like to do some testing before this is committed. 
  1. Have you had a chance to do any testing? I doubt there will be any problems with existing code, but code outside the repository may need to be updated. It should mostly (or entirely) be straightforward find and replace.
Ship it!
Posted (Sept. 25, 2011, 7:20 p.m.)
I'm ok with it.