cpu: implements vector registers
Review Request #2828 - Created May 17, 2015 and submitted
| Information | |
|---|---|
| Nilay Vaish | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 10880:f9a35df0dce5 --------------------------- cpu: implements vector registers This patch aims at adding vector registers type. The type is defined as a std::array of some fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
Hi,
I'm Giacomo and I've been working on ARMv8 support in the past (including NEON), and I'd like to share some thoughts on this patch.
In general, I like the idea of adding a vector type, as it's going to be a more future-proof solution, and ARMv8 code could benefit as well from that... but I have a few concerns regarding this code as it stands currently:
- The patch doesn't address the problem of overlapping registers (for now it's only xmm's & ymm's, but zmm's are likely to join the party soon); in this context, it's not clear to me if we'd need something fundamentally different already at this stage to address that - the risk is to upstream this code and then later discover that we need a different strategy for overlapping. Maybe just having a plan on how to address that in the future could be enough at this stage?
- In the interim we would have broken SSE-AVX/AVX-2 interworking. Now here I'm not sure where the gem5 community draws the line in terms of ISA correctness vs. getting code out of the door for early experiments, etc. :) At least it would be good to have some thoughts from the wider gem5 community here...
- I guess adding vectors could enable implementing AVX-512, and potentially it could also enable research into more wacky "vector-style" architectures, which means that potentially the vectors could be REALLY wide. For this reason, I'm a bit concerned about passing around all those vectors by value. How about changing the register access API to return by ref when accessing vectors? (not quite sure about the implications of that on the rest of the code base though, but I think we'd need to somehow address the efficiency problem).
Thanks, and sorry for the long review!
Description: |
|
||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+615 -48) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+689 -62) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+698 -63) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+698 -63) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+701 -63) |
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 7 (+794 -71) |
-
src/sim/insttracer.hh (Diff revision 7) -
Does this break length assumptions?
I think you are missing serialization support for vector registers. See serialize() and unserialize() in thread_context.cc.
-
src/cpu/exec_context.hh (Diff revision 7) -
Wouldn't it be safe to return a const ref here?
Also, could you flag the read method as const.
-
src/cpu/o3/cpu.hh (Diff revision 7) -
Reference?
-
src/cpu/o3/cpu.hh (Diff revision 7) -
Reference?
-
src/cpu/o3/dyn_inst.hh (Diff revision 7) -
Inconsistent spacing around reference.
-
src/cpu/simple_thread.hh (Diff revision 7) -
Would it make sense to return a constant reference here?
-
src/cpu/simple_thread.hh (Diff revision 7) -
Constant reference? Also, flag method as const.
-
src/cpu/thread_context.hh (Diff revision 7) -
How about returning a constant reference here?
-
src/cpu/thread_context.hh (Diff revision 7) -
Return const ref to the vector register?
Description: |
|
|||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 8 (+836 -76) |
These are my current thoughts about this patch:
My impression is that there is still not enough architectural support to understand if the new vector register type as it stands can address all the
different corner cases efficiently; I'd leave to the wider gem5 community decide where we want to draw that line...Legacy SSE requires merging of upper lanes, while AVX does zeroing; also ARMv8 AArch64 scalar FP and NEON instructions perform zeroing. Assuming that
destination vectors are always read is going to introduce unneded serialization for those ISA extensions if they are going to be ported to the new scheme, so I'd suggest to avoid to implicitly read on write.
Also for cases where merging is required, maybe something smarter should be done to avoid unneded serialization; without optimizations, any sequence of x86 FP scalar instructions could be significantly slow compared to real hw implementations.
