[Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Tue Jun 14 20:02:10 GMT 2022

Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might
solve my immediate issues in clang, but I think we should still consider
moving to a more canonical naming of lambdas that, necessarily, doesn't
include the file name (unfortunately). Probably has to include the lambda
numbering/something roughly equivalent to the mangled lambda name - it
could include type information (it'd be superfluous to a unique identifier,
but I don't think it would break consistently naming the same type across
CUs either).

Anyone got ideas/preferences/thoughts on this?

On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>>
>> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> A rather common "quality of implementation" issue seems to be lambda
>> naming.
>>
>> I came across this due to non-canonicalization of lambda names in
>> template parameters depending on how a source file is named in Clang, and
>> GCC's seem to be very ambiguous:
>>
>> $ cat tmp/lambda.h
>> template<typename T>
>> void f1(T) { }
>> static int i = (f1([]{}), 1);
>> static int j = (f1([]{}), 2);
>> void f1() {
>>   f1([]{});
>>   f1([]{});
>> }
>> $ cat tmp/lambda.cpp
>> #ifdef I_PATH
>> #include <tmp/lambda.h>
>> #else
>> #include "lambda.h"
>> #endif
>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
>> lambda.o | grep "f1<"
>>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")
>>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")
>>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")
>>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")
>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
>> "f1<"
>>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:3:20)>")
>>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:4:20)>")
>>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:6:6)>")
>>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:7:6)>")
>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
>> "f1<"
>>                 DW_AT_name      ("*f1<*f1()::<lambda()> >")
>>                 DW_AT_name      ("*f1<*f1()::<lambda()> >")
>>                 DW_AT_name      ("*f1<*<lambda()> >")
>>
>>                 DW_AT_name      ("*f1<*<lambda()> >")
>>
>> (I came across this in the context of my simplified template names work -
>> rebuilding names from the DW_TAG description of the template parameters -
>> and while I'm not rebuilding names that have lambda parameters (keep
>> encoding the full string instead). The issue is if some other type
>> depending on a type with a lambda parameter - but then multiple uses of
>> that inner type exist, from different translation units (using type units)
>> with different ways of naming the same file - so then the expected name has
>> one spelling, but the actual spelling is different due to the "./")
>>
>> But all this said - it'd be good to figure out a reliable naming - the
>> naming we have here, while usable for humans (pointing to surce files, etc)
>> - they don't reliably give unique names for each lambda/template
>> instantiation which would make it difficult for a consumer to know if two
>> entities are the same (important for types - is some function parameter the
>> same type as another type?)
>>
>> While it's expected cross-producer (eg: trying to be compatible with GCC
>> and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or
>> "f1<int *>" at the most basic - there are more complicated cases) - this
>> one's not possible with the data available.
>>
>> The source file/line/column is insufficient to uniquely identify a lambda
>> (multiple lambdas stamped out by a macro would get all the same
>> file/line/col) and valid code (albeit unlikely) that writes the same
>> definition in multiple places could make the same lambda have different
>> names.
>>
>> We should probably use something more like the way various ABI manglings
>> do to identify these entities.
>>
>> But we should probably also do this for other unnamed types that have
>> linkage (need to/would benefit from being matched up between two CUs), even
>> not lambdas.
>>
>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for
>> these symbols is:
>>
>>  void f1<$_0>($_0)
>>  f1<$_1>($_1)
>>  void f1<f1()::$_2>(f1()::$_2)
>>  void f1<f1()::$_3>(f1()::$_3)
>>
>> Should we use that instead?
>>
>>
>> The only other information that the current human-readable DWARF name
>> carries is the file+line and that is fully redundant with DW_AT_file/line,
>> so the above scheme seem reasonable to me. Poorly symbolicated backtraces
>> would be worse in this scheme, so I'm expecting most pushback from users
>> who rely on a tool that just prints the human readable name with no source
>> info.
>>
>
> Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway,
> so encoding it in the type name does seem redundant and inefficient indeed
> (beyond/independent of the correctness issues).
>
>> GCC's mangling's different (in these examples that's OK, since they're
>> all internal linkage):
>>
>>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
>>  void f1<f1()::'lambda'()>(f1()::'lambda'())
>>
>> If I add an example like this:
>>
>> inline auto f1() { return []{}; }
>>
>> and instantiate the template with the result of f1:
>>
>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>>
>> GCC:
>>
>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>>
>> So they consistently use the same mangling - we could use the same naming
>> for template parameters?
>>
>> How should we communicate this sort of identity for unnamed types in the
>> DIEs describing the types themselves (not just the string of a template
>> name of a type instantiated with the unnamed type) so the unnamed type can
>> be matched up between translation units.
>>
>> eg, if I have these two translation units:
>> // header
>> inline auto f1() { struct { } local; return local; }
>> // unit 1:
>> #include "header"
>> auto f2(decltype(f1())) { }
>> // unit 2:
>> #include "header"
>> decltype(f1()) v1;
>>
>> Currently the DWARF produced for this unnamed type is:
>> 0x0000003f:   DW_TAG_structure_type
>>                 DW_AT_calling_convention        (DW_CC_pass_by_value)
>>                 DW_AT_byte_size (0x01)
>>                 DW_AT_decl_file (
>> "/usr/local/google/home/blaikie/dev/scratch/test.cpp")
>>                 DW_AT_decl_line (1)
>>
>>
>> is this the type of struct {}?
>>
>
> Yep. You'll get separate distinct descriptions that are essentially the
> same - imagine if `f1` had two such types written as "struct {}" (say they
> were used to instantiate two different templates - "struct {} a; struct {}
> b; f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed
> DW_TAG_structure_types and two template specializations, etc - but no way
> to know which of those unnamed types line up with uses in another
> translation unit, in terms of overload resolution, etc.
>
>> So there's no way to know if you see that structure type definition in
>> two different translation units whether they refer to the same type because
>> there may be multiple types that have the same DWARF description. (so no
>> way to know if the DWARF consumer should allow the user to evaluate an
>> expression `f2(v1)` or not, I think?)
>>
>>
>> Does a C++ compiler usually treat structurally equivalent but differently
>> named types as interchangeable?
>>
>
> No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " -
> "f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time.
> https://godbolt.org/z/de7Yce1qW
>
>
>> Does a C++ compiler usually treat structurally equivalent anonymous types
>> as interchangeable?
>>
>
> No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8
>
>
>>
>> -- adrian
>>
>>
>> I guess the only way to have an unnamed type with linkage is to use it
>> inside an inline function - so within that scope you'd have to produce
>> DWARF for any types consistently in all definitions of the function and
>> then a consumer could match them up by counting (assuming the unnamed types
>> were always emitted in the same order in the child DIE list)...
>>
>> But this all seems a bit subtle & maybe would benefit from a more
>> robust/explicit description?
>>
>> Perhaps adding an integer attribute to number anonymous types? They'd
>> need to differentiate between lambdas and other anonymous types, since they
>> have separate numberings.
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20220614/f521ca55/attachment-0001.html>