Posted (March 31, 2011, 3:22 a.m.)
Since there have been no objections, I'm going to commit this.
Posted (April 3, 2011, 10:27 a.m.)
-
ext is no longer set to a raw bitvector that selects per instruction features like this since, as you can see, it's pretty opaque just looking at it. The maddf ext=1 becomes ext=Scalar. For msrli and mslli, ext=0 is the default and can be dropped. It would leave the ops as SIMD. Since they're already operating at the full width of the fp register type (a double) the value is especially redundant.
-
This implementation is a bit inefficient, although not terribly so. You have to be careful since the two operands may be the same registers and you don't want to overwrite something you still need, but, for instance, the maddf one line above, this shift of ufp4 and the maddf on line 60 could all update xmmh since all "high" halves of xmm registers have been read and no faults can happen. The moves that read out xmmlm could be moved higher, and xmml could also be updated directly. I think it -may- also be possible to do something clever and cut down the number of microops shifting things around to pack and unpack the results. I may have also suspected this was true when I wrote the much simpler 64 bit wide version of this instruction below this one where the components are whole registers and can be indexed directly, but then didn't come up with anything and punted for later.
-
This microop is changing architecturally visible state and effectively committing to completing the op before all the possibly faulting ops have executed, specifically the following loads. There are 8 microcode fp registers so you can just use the others and leave ufp3 around until the end.
-
Like above, this can't happen before the loads.
