[Dwarf-discuss] ISSUE: CPU vector types.

Fri Mar 31 10:18:44 GMT 2023

On 2023-03-31 4:29 a.m., Ben Woodard wrote:
>> On Mar 30, 2023, at 1:27 PM, Pedro Alves <alves.ped@gmail.com> wrote:
>>
>> On 2023-03-29 8:55 p.m., Ben Woodard via Dwarf-discuss wrote:

>>> *void f( float *a){}**
>>> *[30] pointer_type         abbrev: 5
>>>        byte_size            (implicit_const) 8
>>>        type                 (ref4) [0]
>>>
>>> *void f( float a[]){}*
>>> [30] array_type         abbrev: 5
>>>        type                 (ref4) [0]
>>>
>>
>> In reality, the "float a[]" case above is not really an array.
>> 'a' is still a pointer.  There is no such thing as passing an array by value in C/C++.  The two cases above
>> are defining the same function overload.  Note:
>>
>> $ g++ v.cc -o -g3 -O0
>> v.cc:11:6: error: redefinition of ‘void f(float*)’
>>   11 | void f(float a[]) {}
>>      |      ^
>> v.cc:10:6: note: ‘void f(float*)’ previously defined here
>>   10 | void f(float *a) {}
>>      |      ^
>>
>> I guess you're saying that a consumer would know that it is looking at a C or c++ function, and thus is knows
>> that even if the argument's type is described as an array, that it is really a pointer?
> 
> Yeah basically. I should have had more emphasis saying that this is _how_I_would_like_it_to_be_ as a person whose introduction to DWARF was static analysis of ABIs. You have correctly stated how it really is. How it currently is. I just wish it was different so that my static analysis tools could detect the difference between “a head of linked list pointer” “a C array pointer” and all the other semantically distinct uses of C pointers.
> 
> In essence, I was trying to point out the ambiguities that this creates for people like me doing static analysis. One of favorite catch phrases is “DWARF it ain’t just for debuggers anymore”, we use it for static analysis and performance tools and they have slightly different needs than debuggers.

Understood.  I'm actually sympathetic to the possibility of a consumer being able to
reconstruct the original source better from the DWARF.

Defining the arguments as arrays instead of pointers may well work.  It just creates this
odd scenario where the parameter's DW_AT_location contains an address (pointer), while the parameter's
type is described as an array.  Normally the location description matches the type, and to access the variable,
you read/write TYPE_LENGTH bytes at the location.  Any consumer actually accessing that location would have
to know to decay the array to pointer itself.  That might not be a big deal.  It's easy for GDB.

One other little oddity that comes to mind is that because the parameter is really a pointer, then if
we're describing the pointer parameter as an array, then any DWARF attribute that may apply to pointers
must then be applicable to arrays as well.  Something like address space info or some such, for example.

> 
> For example debuggers need location information which are in essence functions that work like 
>    f( pc, variable) -> location
> 
> Performance tools and certain binary analysis tools really could use something that I call “inverted location lists” which work like: 
>     f( pc, location) -> variable or expression. 

I've heard you talk about this before, and I was actually thinking about it when in our DWARF for GPUs
meeting this week when I mentioned the scenario of two different variables being live on (e.g.) the same register, because the
compiler knew they hold the same value -- changing one variable from a debugger changes the other variable as well.
As agreed on the call, a reasonable way to prevent that today is that make both variable's location description be an
implicit (i.e., an rvalue).  With such a reverse mapping, the variable's locations could be just be normal writable
locations, and then the debugger would have the choice of informing the user some like
"warning: changing a also changes b, are you sure?" or some such.

> This would allow us to quickly answer questions like: These instructions cause a huge number of L1 cache misses. Which variable accesses in those instructions are causing that? There are a few experts amongst us who can figure that out.  However, making automated tools which which we can give to developers who are less experienced to allow them figure this out has been remarkably difficult. I’ve tried, a few other people have tried. The way that location lists work, it just doesn’t give you all the information that you need to completely reverse the mapping from:
>    f( pc, variable) -> location
> To:
>   f( pc, location) -> variable.
>>
>>> *void f( float a[4]){}
>>
>> Here I believe '4' must be ignored by the compiler's code generator, at least, the compiler can't really
>> assume that 'a' points to an array with 4 elements.  The '4' is just basically documentation.  Some
>> compilers, such as GCC, use it for warnings, though.
>>
>> There's another case, one you didn't mention, and it is one that _does_ change ABI, which is:
>>
>>  void f(float a[static 4]) {} // C only
> 
> Good one! I’ve largely moved onto C++ and haven’t been watching the C standard as closely.

It's actually a C99 feature.  You're a little behind.  :-)

To be honest, I don't think it's a widely used feature, and I suspect most C programmers haven't
noticed it exists.

For DWARF, if we cared to represent this, following the "describe the source rather than the meaning"
idea, then I guess we'd just need some new attribute somewhere around here:

 void f( float a[static 4]){}
 [30] array_type         abbrev: 5
        type                 (ref4) [0]
 [40] subrange_type        abbrev: 31
        upper_bound          (data1) 3
        static                             << new!

Though you can also write:

 void f(float a[const 4]) {}
 void f(float a[volatile 4]) {}

It'd be worth it to think about those are well, see about handling them all similarly.