[Dwarf-discuss] ISSUE: tensor types. V3

Wed Apr 26 00:35:48 GMT 2023

On 4/24/23 13:17, Todd Allen via Dwarf-discuss wrote:
> On 4/24/23 13:27, Ben Woodard via Dwarf-discuss wrote:
>>> As for NEON vs. SVE, is there a need to differentiate them?  And can it
>>> not be done by shape of the type?
>> That one continues to be hard. ARM processors that support SVE also have
>> NEON registers which like the Intel SSE MMX AVX kind of vector registers
>> are architecturally specified as having a specific number of bits.
>> Handling those are trivial.
>>
>> The weird thing about SVE registers (and the same things also apply to
>> RVV) are that the number of bits is not architecturally defined and is
>> therefore unknown at compile time. The size of the registers can even
>> vary from hardware implementation to hardware implementation. So a
>> simple processor may only have a 128b wide SVE register while a monster
>> performance core may have 2048b wide SVE registers. The predicate
>> registers scale the same way. I that it can even vary from core to core
>> within a CPU sort of like intel's P-cores vs E-cores. To be able to even
>> know how much a loop is vectorized you need to read a core specific
>> register that specifies how wide the vector registers are on this
>> particular core. Things like induction variables are incremented by the
>> constant in that core specific register divided by size of the type
>> being acted upon. So some of the techniques used to select lanes in
>> DWARF don't quite work the same way.
>>
>> Just to make things even more difficult, when one of these registers are
>> spilled to memory like the stack the size is unknown at compile time and
>> so any subsequent spilling has to determine the size that it takes up.
>> So any subsequent offsets need to use DWARF expressions to that
>> reference the width of the vector.
>>
>> ...and then there is SME which is like SVE but they are matrices rather
>> than vectors. The mind boggles.
>>
> So the variability of the vector size is the only significant difference
> that you've identified?  If so, then I think the shape of the array type
> probably is sufficient.  For SVE, the DW_TAG_subrange_type will have a
> DW_AT_upper_bound which is a variable (reference or dwarf expr), or the
> DW_TAG_array_type's DW_AT_{byte,bit}_size will be a variable, or both.
> Meanwhile, NEON would use DW_AT_bit_size 128 (or DW_AT_byte_size 16) and
> a constant DW_AT_upper_bound (128/bitsizeof(elementtype)).  That seems
> like it very directly reflects the difference between the two vector types.

I went back and revisited the research that I did on behalf of customers 
a few years back when customers first got access to SVE and started 
debugging it. The state of the art has advanced since I did that work.

Back then we ran into problems because the only way to get the size of 
the hardware vector was to read a core specific register. A big problem 
was that if you were debugging something like a core file, you didn't 
have access to the that core specific register. There was no way to 
reference the core specific register from DWARF.

Furthermore while on the systems that I was looking at, all the cores 
were the same, it was architecturally allowed to have different sizes of 
the vector registers depending on which core that you were running on.

At the time, we realized that there needed to be some "magic" that 
didn't exist at the time that provided the debugger with the width of 
the vector. It was this complexity that really left me feeling that SVE 
needed to be its own special thing.

At the time we discussed several options. One was pushing the size of 
the vector into a normal variable so that it could be referenced by 
DWARF; however we didn't know how to make that work because it could 
change depending on which core the code was executing on. There was also 
a kernel problem associated with that, the information about where the 
process was executing needed to be included in the crash dumps. There 
was also a feeling that there was something wrong with this approach 
because the only reason for the variable to exist would be to support 
debugging and keeping it up to date added overhead, and probably some 
kernel support.

Another idea we kicked around was giving the core specific register a 
name and number in the register file so that DWARF could access it. This 
broke ABI. At that time, that option was immediately shot down.

I wasn't able to give the customers a good answer. I didn't know how to 
solve the problem. Word evidently got back to ARM and they wrote: 
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#dwarf-register-names 
The big innovation that made this possible is ARM introduced a "pseudo 
register" which they call VG that is specified to exist in the execution 
environment. They even gave some examples how the DWARF should look for 
these types 
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#vector-types-beta 

I haven't looked at the implementation of how GCC implements the VG 
register yet. So I don't know how it handles some of the problems that 
vexed us like making sure the VG exists in crash dumps and the ABI 
implications of that, and how they ensure that VG is correct for the 
core they are executing on when a process could move from one core to 
another with different vector sizes. However, I'm going to look around 
in gcc/config/aarch64/aarch64.cc and see if I can figure it out. The ARM 
guys are great, but I wouldn't be surprised if I found some bugs there.

They also introduced a new wrinkle that I hadn't come across yet and 
that was SME streaming mode and how that can change apparent size of the 
vector register.

All of this leads me to the conclusion, that you are in fact correct, we 
don't need a special flavor of tensors to handle SVE. The complexity 
that I knew was under the surface which I felt would need some special 
help with in DWARF, got handled in by pushing it into runtime 
environment. (My customers have mostly moved away from ARM but should 
they move back, my gut feeling is that part of the map should be labeled 
"here be dragons" and I expect to be chasing some tricky bugs.)

-ben

>>> If all those things You argued that it still should be an enum, but
>>> with only one "default"
>>> value defined.  And I guess any other values that might be added later
>>> would be (or at least start as) vendor extensions. It's peculiar, and I
>>> don't think we have that anywhere else in the standard.
>> I guess that my point is that I'm fairly certain that SVE and RVV will
>> need special handling and when the compilers start handling the matrix
>> types that the hardware is starting to support, they are going need some
>> help as well.
> If there's something more peculiar about the types inhabiting these
> vector registers than "variable size", that might convince me.  But
> merely being variable-sized doesn't.
>>> If it ever became necessary, you can always add a 2nd attribute for it.
>>> As an example, in our Ada compiler decades ago, we did this for
>>> DW_AT_artificial.  It's just a flag, so either present or not-present.
>>> We added a 2nd DW_AT_artificial_kind with a whole bunch of different
>>> enums for the various kinds our compiler generated.  The point is you
>>> still can get there even if DW_AT_tensor is just a flag.
>> Totally, not opposed to that if that is the way that people want to
>> handle it. My only (admittedly weak) argument against doing it that way
>> is that there there will now be two attributes rather than one and the
>> space that it takes up. John DelSignore was just dealing with a program
>> that had 4.9GB of DWARF, it would be nice to keep it as compact as
>> possible. Of course most of that is likely location lists and template
>> instantiations and stuff like that not the relatively rare case like
>> this. The cases where this shows up are likely going to be fairly rare.
>>
>> Would this be an acceptable compromise for V4 of my proposal? I drop it
>> back to just being a flag for the time being. Then in a subsequent
>> submission (which may or may not be in the DWARF6 cycle -- but hopefully
>> is in time for DWARF6), if I find it necessary to make a flavor to
>> support SVE, RVV or SME, then my submission for that will include
>> changing DW_AT_tensor to requiring a constant that then references an
>> enum like I did above. If it comes out before DWARF6 is released then
>> great, we don't have to redefine anything. If It bumped to DWARF7 then
>> we add a _kind attribute.
> You can submit it in whichever form you prefer.  I supposed you were
> soliciting comments here to get it in a form as close to acceptable as
> possible before submitting it.  After you do, the committee will discuss
> it, probably ad nauseum.  (And I'll be no exception.)  And changes may
> happen then.  Seldom is it rubber stamp vs. reject.
>
> Regards,
> Todd
>