[Dwarf-discuss] ISSUE: tensor types. V3

Mon Apr 24 19:27:32 GMT 2023

On 4/24/23 09:50, Todd Allen via Dwarf-discuss wrote:
> On 4/21/23 16:31, Ben Woodard via Dwarf-discuss wrote:
>>>>      Insert the following paragraph between the first paragraph of
>>>>      normative text describing DW_TAG_array_type and the second
>>>> paragraph
>>>>      dealing with multidimensional ordering.
>>>>
>>>> --------------------------------------------------------------------
>>>>          An array type that refers to a vector or matrix type, shall be
>>>>          denoted with DW_AT_tensor whose integer constant, will
>>>> specify the
>>>>          kind of tensor it is. The default type of tensor shall be
>>>> the kind
>>>>          used by the vector registers in the target architecture.
>>>>
>>>>              Table 5.4: Tensor attribute values
>>>> ------------------------------------------------------------------
>>>>          Name              | Meaning
>>>> ------------------------------------------------------------------
>>>>          DW_TENSOR_default | Default encoding and semantics used by
>>>> target
>>>>                    | architecture's vector registers
>>>>          DW_TENSOR_boolean | Boolean vectors map to vector mask
>>>> registers.
>>>>          DW_TENSOR_opencl  | OpenCL vector encoding and semantics
>>>>          DW_TENSOR_neon    | NEON vector encoding and semantics
>>>>          DW_TENSOR_sve     | SVE vector encoding and semantics
>>>> ------------------------------------------------------------------
>>> As someone who was not sitting in on your debugging GPUs discussions,
>>> this table
>>> is baffling.  Is it based on the "Vector Operations" table on the clang
>>> LanguageExtensions page you mentioned?
>> Yes
>>> That page is a wall of text, so I might
>>> have missed another table, but these values are a subset of columns
>>> from that
>>> table.
>>>
>>> 1 of the values here is a source language (opencl), 2 reflect
>>> specific vector
>>> registers of one specific architecture (neon & sve), and I don't even
>>> know what
>>> boolean is meant to be.  Maybe a type that you would associate with
>>> predicate
>>> registers?  I think this table needs a lot more explanation.
>> This was something that Pedro pointed out and it was something that I
>> hadn't thought of. The overall justification for this is that these
>> types were semantically different than normal C arrays in several
>> distinct ways. There is this table which explains the differences:
>> https://clang.llvm.org/docs/LanguageExtensions.html#vector-operations
>> The argument is that the semantics of different flavors are different
>> enough that they need to be distinct.
>>
>> I really do not know much of anything about OpenCL style vectors, I
>> wouldn't at all be against folding that constant in because it is
>> something that could be inferred from the source language. I left it in
>> because I thought that there might exist in cases where clang compiles
>> some OpenCL code that references some intrinsics written in another
>> language like C/C++ which depends on the semantics of OpenCL vector
>> types.
>>
>> NEON, yeah I think we should drop that one. The current GCC semantics
>> are really Intel's vector semantics. By changing it from "GCC semantics"
>> to "Default encoding and semantics used by target architecture's vector
>> registers" I think we eliminate the need for that.
>>
>> You are correct boolean is for predicate register types. After looking
>> at the calling conventions, these are not passed as types themselves. So
>> for the purpose of this submission, I don't think we need it. I believe
>> that some of the stuff that Tony and the AMD, and intel guys are almost
>> ready to submit has DWARF examples of how to make use of predicate
>> registers in SIMD and SIMT and access variables making use of predicate
>> registers should be sufficient for those.
>>
>> ARM SVE and RISC-V RVV are really weird because of those HW
>> implementation defined vs architecturally defined register and therefore
>> type widths. It has been a couple of compiler generation iterations
>> since I looked at the DWARF for those but but when I last looked, the
>> compilers didn't know what to do with those and so they didn't generate
>> usable DWARF. So I feel like there are additional unsolved problems with
>> the SVE and RVV types that will need to be addressed. It is a problem,
>> that I know that I need to look into -- but right now I do not have any
>> "quality of DWARF" user issues pulling it closer to the top of my
>> priority list. The only processor I've seen with SVE is the A64FX used
>> in Fugaku and the HPE Apollo 80's, the Apple M1 and M2 don't have it and
>> I haven't seen any of the newer ARM enterprise CPUs. I don't think there
>> are any chips with RVV yet. Once more users have access to hardware that
>> supports it, I know that it will be more of a problem. I kind of feel
>> like that will be a whole submission in and of itself.
>>
>>
> So you're thinking that "OpenCL vector semantics" ought to be
> determinable from DW_AT_language DW_LANG_OpenCL?  Seems reasonable.
>
> DW_TENSOR_boolean: Could it just be determinable from the shape of the
> array?  For example:
>
> <BOOL>  DW_TAG_base_type
>              DW_AT_bit_size    : 1
>
>           DW_TAG_array_type
>              DW_AT_name        : predicate_t
>              DW_AT_byte_size   : 16
>              DW_AT_type        : <BOOL>
>              DW_AT_tensor      : yes (encoding TBD)
>                   DW_TAG_subrange_type
>                   DW_AT_type        : <whatever>
>                   DW_AT_lower_bound : 0
>                   DW_AT_upper_bound : 128
>
> NEON/SVE/RVV ought to be determinable by knowing what kind of machine
> the debugger is running on (ARM/RISC-V).  Or, for something like
> dwarfdump which might try to read a foreign-architecture ELF file, from
> the ELF header.  (Not that dwarfdump specifically is going to care...)
> As for NEON vs. SVE, is there a need to differentiate them?  And can it
> not be done by shape of the type?

That one continues to be hard. ARM processors that support SVE also have 
NEON registers which like the Intel SSE MMX AVX kind of vector registers 
are architecturally specified as having a specific number of bits. 
Handling those are trivial.

The weird thing about SVE registers (and the same things also apply to 
RVV) are that the number of bits is not architecturally defined and is 
therefore unknown at compile time. The size of the registers can even 
vary from hardware implementation to hardware implementation. So a 
simple processor may only have a 128b wide SVE register while a monster 
performance core may have 2048b wide SVE registers. The predicate 
registers scale the same way. I that it can even vary from core to core 
within a CPU sort of like intel's P-cores vs E-cores. To be able to even 
know how much a loop is vectorized you need to read a core specific 
register that specifies how wide the vector registers are on this 
particular core. Things like induction variables are incremented by the 
constant in that core specific register divided by size of the type 
being acted upon. So some of the techniques used to select lanes in 
DWARF don't quite work the same way.

Just to make things even more difficult, when one of these registers are 
spilled to memory like the stack the size is unknown at compile time and 
so any subsequent spilling has to determine the size that it takes up. 
So any subsequent offsets need to use DWARF expressions to that 
reference the width of the vector.

...and then there is SME which is like SVE but they are matrices rather 
than vectors. The mind boggles.

> If all those things are eliminated, then you're back to just needing a
> flag: tensor vs. not-tensor.
>
>> How about:
>>
>>             Table 5.4: Tensor attribute values
>> ------------------------------------------------------------------
>>         Name              | Meaning
>> ------------------------------------------------------------------
>>         DW_TENSOR_default | Default encoding and semantics used by target
>>                           | architecture's vector registers
>> ------------------------------------------------------------------
>>
>> The point is I believe that there are going to be flavors. Can we leave
>> it an enum?
>>
>> Then if SVE, and RVV end up being sufficiently different we have a way
>> to handle them. I also double checked and ARM V9.1 SME is now publicly
>> disclosed so we have at least 3 architectures that I know that have
>> matrix registers but the compiler support hasn't quite caught up yet.
>>
> You argued that it still should be an enum, but with only one "default"
> value defined.  And I guess any other values that might be added later
> would be (or at least start as) vendor extensions. It's peculiar, and I
> don't think we have that anywhere else in the standard.
I guess that my point is that I'm fairly certain that SVE and RVV will 
need special handling and when the compilers start handling the matrix 
types that the hardware is starting to support, they are going need some 
help as well.
> If it ever became necessary, you can always add a 2nd attribute for it.
> As an example, in our Ada compiler decades ago, we did this for
> DW_AT_artificial.  It's just a flag, so either present or not-present.
> We added a 2nd DW_AT_artificial_kind with a whole bunch of different
> enums for the various kinds our compiler generated.  The point is you
> still can get there even if DW_AT_tensor is just a flag.

Totally, not opposed to that if that is the way that people want to 
handle it. My only (admittedly weak) argument against doing it that way 
is that there there will now be two attributes rather than one and the 
space that it takes up. John DelSignore was just dealing with a program 
that had 4.9GB of DWARF, it would be nice to keep it as compact as 
possible. Of course most of that is likely location lists and template 
instantiations and stuff like that not the relatively rare case like 
this. The cases where this shows up are likely going to be fairly rare.

Would this be an acceptable compromise for V4 of my proposal? I drop it 
back to just being a flag for the time being. Then in a subsequent 
submission (which may or may not be in the DWARF6 cycle -- but hopefully 
is in time for DWARF6), if I find it necessary to make a flavor to 
support SVE, RVV or SME, then my submission for that will include 
changing DW_AT_tensor to requiring a constant that then references an 
enum like I did above. If it comes out before DWARF6 is released then 
great, we don't have to redefine anything. If It bumped to DWARF7 then 
we add a _kind attribute.

-ben

> Regards,
> Todd
>