[Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Mon Aug 22 19:44:20 GMT 2022

Ping - any thoughts here?

On Sun, Jul 24, 2022 at 9:08 PM David Blaikie <dblaikie at gmail.com> wrote:
>
> Ping on this thread - would love to hear what ideas folks have for
> addressing the naming of anonymous types (enums, structs/classes, and
> lambdas) - especially if it'd make it easier to go back/forth between
> the DW_AT_name of a template with an unnamed type as a parameter and
> the actual DIEs describing the same parameter type.
>
> On Tue, Jun 14, 2022 at 1:02 PM David Blaikie <dblaikie at gmail.com> wrote:
> >
> > Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might solve my immediate issues in clang, but I think we should still consider moving to a more canonical naming of lambdas that, necessarily, doesn't include the file name (unfortunately). Probably has to include the lambda numbering/something roughly equivalent to the mangled lambda name - it could include type information (it'd be superfluous to a unique identifier, but I don't think it would break consistently naming the same type across CUs either).
> >
> > Anyone got ideas/preferences/thoughts on this?
> >
> > On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblaikie at gmail.com> wrote:
> >>
> >> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <aprantl at apple.com> wrote:
> >>>
> >>>
> >>>
> >>> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >>>
> >>> A rather common "quality of implementation" issue seems to be lambda naming.
> >>>
> >>> I came across this due to non-canonicalization of lambda names in template parameters depending on how a source file is named in Clang, and GCC's seem to be very ambiguous:
> >>>
> >>> $ cat tmp/lambda.h
> >>> template<typename T>
> >>> void f1(T) { }
> >>> static int i = (f1([]{}), 1);
> >>> static int j = (f1([]{}), 2);
> >>> void f1() {
> >>>   f1([]{});
> >>>   f1([]{});
> >>> }
> >>> $ cat tmp/lambda.cpp
> >>> #ifdef I_PATH
> >>> #include <tmp/lambda.h>
> >>> #else
> >>> #include "lambda.h"
> >>> #endif
> >>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot lambda.o | grep "f1<"
> >>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:3:20)>")
> >>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:4:20)>")
> >>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:6:6)>")
> >>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:7:6)>")
> >>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep "f1<"
> >>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:3:20)>")
> >>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:4:20)>")
> >>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:6:6)>")
> >>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:7:6)>")
> >>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep "f1<"
> >>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
> >>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
> >>>                 DW_AT_name      ("f1<<lambda()> >")
> >>>
> >>>                 DW_AT_name      ("f1<<lambda()> >")
> >>>
> >>> (I came across this in the context of my simplified template names work - rebuilding names from the DW_TAG description of the template parameters - and while I'm not rebuilding names that have lambda parameters (keep encoding the full string instead). The issue is if some other type depending on a type with a lambda parameter - but then multiple uses of that inner type exist, from different translation units (using type units) with different ways of naming the same file - so then the expected name has one spelling, but the actual spelling is different due to the "./")
> >>>
> >>> But all this said - it'd be good to figure out a reliable naming - the naming we have here, while usable for humans (pointing to surce files, etc) - they don't reliably give unique names for each lambda/template instantiation which would make it difficult for a consumer to know if two entities are the same (important for types - is some function parameter the same type as another type?)
> >>>
> >>> While it's expected cross-producer (eg: trying to be compatible with GCC and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or "f1<int *>" at the most basic - there are more complicated cases) - this one's not possible with the data available.
> >>>
> >>> The source file/line/column is insufficient to uniquely identify a lambda (multiple lambdas stamped out by a macro would get all the same file/line/col) and valid code (albeit unlikely) that writes the same definition in multiple places could make the same lambda have different names.
> >>>
> >>> We should probably use something more like the way various ABI manglings do to identify these entities.
> >>>
> >>> But we should probably also do this for other unnamed types that have linkage (need to/would benefit from being matched up between two CUs), even not lambdas.
> >>>
> >>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these symbols is:
> >>>
> >>>  void f1<$_0>($_0)
> >>>  f1<$_1>($_1)
> >>>  void f1<f1()::$_2>(f1()::$_2)
> >>>  void f1<f1()::$_3>(f1()::$_3)
> >>>
> >>> Should we use that instead?
> >>>
> >>>
> >>> The only other information that the current human-readable DWARF name carries is the file+line and that is fully redundant with DW_AT_file/line, so the above scheme seem reasonable to me. Poorly symbolicated backtraces would be worse in this scheme, so I'm expecting most pushback from users who rely on a tool that just prints the human readable name with no source info.
> >>
> >>
> >> Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway, so encoding it in the type name does seem redundant and inefficient indeed (beyond/independent of the correctness issues).
> >>>
> >>> GCC's mangling's different (in these examples that's OK, since they're all internal linkage):
> >>>
> >>>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
> >>>  void f1<f1()::'lambda'()>(f1()::'lambda'())
> >>>
> >>> If I add an example like this:
> >>>
> >>> inline auto f1() { return []{}; }
> >>>
> >>> and instantiate the template with the result of f1:
> >>>
> >>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
> >>>
> >>> GCC:
> >>>
> >>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
> >>>
> >>> So they consistently use the same mangling - we could use the same naming for template parameters?
> >>>
> >>> How should we communicate this sort of identity for unnamed types in the DIEs describing the types themselves (not just the string of a template name of a type instantiated with the unnamed type) so the unnamed type can be matched up between translation units.
> >>>
> >>> eg, if I have these two translation units:
> >>> // header
> >>> inline auto f1() { struct { } local; return local; }
> >>> // unit 1:
> >>> #include "header"
> >>> auto f2(decltype(f1())) { }
> >>> // unit 2:
> >>> #include "header"
> >>> decltype(f1()) v1;
> >>>
> >>> Currently the DWARF produced for this unnamed type is:
> >>> 0x0000003f:   DW_TAG_structure_type
> >>>                 DW_AT_calling_convention        (DW_CC_pass_by_value)
> >>>                 DW_AT_byte_size (0x01)
> >>>                 DW_AT_decl_file ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
> >>>                 DW_AT_decl_line (1)
> >>>
> >>>
> >>> is this the type of struct {}?
> >>
> >>
> >> Yep. You'll get separate distinct descriptions that are essentially the same - imagine if `f1` had two such types written as "struct {}" (say they were used to instantiate two different templates - "struct {} a; struct {} b; f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed DW_TAG_structure_types and two template specializations, etc - but no way to know which of those unnamed types line up with uses in another translation unit, in terms of overload resolution, etc.
> >>>
> >>> So there's no way to know if you see that structure type definition in two different translation units whether they refer to the same type because there may be multiple types that have the same DWARF description. (so no way to know if the DWARF consumer should allow the user to evaluate an expression `f2(v1)` or not, I think?)
> >>>
> >>>
> >>> Does a C++ compiler usually treat structurally equivalent but differently named types as interchangeable?
> >>
> >>
> >> No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " - "f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time. https://godbolt.org/z/de7Yce1qW
> >>
> >>>
> >>> Does a C++ compiler usually treat structurally equivalent anonymous types as interchangeable?
> >>
> >>
> >> No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8
> >>
> >>>
> >>>
> >>> -- adrian
> >>>
> >>>
> >>> I guess the only way to have an unnamed type with linkage is to use it inside an inline function - so within that scope you'd have to produce DWARF for any types consistently in all definitions of the function and then a consumer could match them up by counting (assuming the unnamed types were always emitted in the same order in the child DIE list)...
> >>>
> >>> But this all seems a bit subtle & maybe would benefit from a more robust/explicit description?
> >>>
> >>> Perhaps adding an integer attribute to number anonymous types? They'd need to differentiate between lambdas and other anonymous types, since they have separate numberings.
> >>>
> >>>