[Dwarf-discuss] lambda (& other anonymous type) identification/naming

David Blaikie dblaikie@gmail.com
Wed Jul 3 17:52:52 GMT 2024


+Pavel Labath <labath@google.com> - since he's hit some issues related to
this in lldb.

Oh, yeah, I thought there was a case where we mangled the function
parameter into the mangled name of a lambda - but I might've misremembered.
The global variable case seems the closest to that and appears previously
in this thread - scoping types within variable DIEs seems weird enough that
I'm not convinced it's a great direction to go...

Unnamed things are... unnameable, so their scope seems sort of unimportant
to a degree - and maybe just providing an identifier (the mangled name of
the type) to allow a consumer to match them up would be good.

On Tue, Feb 28, 2023 at 4:07 PM David Blaikie <dblaikie@gmail.com> wrote:

> Hmm - I guess one complication of only putting the mangling number on
> the type, is that you need the scope of the lambda too... which is
> tricky in this case:
>
> extern int i;
> int i = []{ return 3; }();
>
> In this case, the lambda is mangled in the scope of the global
> variable `i`: i::{lambda()#1}::operator()() const
> (https://godbolt.org/z/15Eqa8ajT)
>
> Oh, and I guess you can use a lambda without ever instantiating its
> operator(), and for a generic lambda there's nothing to describe...
>
> eg:
> template<typename T>
> void f1(const T&){}
> inline void f2() {
>   f1([](auto){});
> }
> void f3() {
>   f2();
> }
>
> Clang's DWARF for the anonymous type is:
> 0x00000043:     DW_TAG_class_type
>                   DW_AT_calling_convention      (DW_CC_pass_by_value)
>                   DW_AT_byte_size       (0x01)
>                   DW_AT_decl_file
> ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
>                   DW_AT_decl_line       (4)
>
> GCC's includes a dtor (called "~<lambda>") but the type just has size,
> file, line, and column.
>
> So we could avoid using the whole mangled name of the anonymous type
> in some cases - maybe it's worth having features (like being able to
> provide the mangling number in an attribute, maybe being able to scope
> the type inside a variable DIE? though that sounds a bit frightening)
> to help in those cases, even if in some of the worst cases we'd have
> to use the mangled name to reassociate anonymous types?
>
> - Dave
>
> On Mon, Aug 22, 2022 at 12:44 PM David Blaikie <dblaikie@gmail.com> wrote:
> >
> > Ping - any thoughts here?
> >
> > On Sun, Jul 24, 2022 at 9:08 PM David Blaikie <dblaikie@gmail.com>
> wrote:
> > >
> > > Ping on this thread - would love to hear what ideas folks have for
> > > addressing the naming of anonymous types (enums, structs/classes, and
> > > lambdas) - especially if it'd make it easier to go back/forth between
> > > the DW_AT_name of a template with an unnamed type as a parameter and
> > > the actual DIEs describing the same parameter type.
> > >
> > > On Tue, Jun 14, 2022 at 1:02 PM David Blaikie <dblaikie@gmail.com>
> wrote:
> > > >
> > > > Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible)
> might solve my immediate issues in clang, but I think we should still
> consider moving to a more canonical naming of lambdas that, necessarily,
> doesn't include the file name (unfortunately). Probably has to include the
> lambda numbering/something roughly equivalent to the mangled lambda name -
> it could include type information (it'd be superfluous to a unique
> identifier, but I don't think it would break consistently naming the same
> type across CUs either).
> > > >
> > > > Anyone got ideas/preferences/thoughts on this?
> > > >
> > > > On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblaikie@gmail.com>
> wrote:
> > > >>
> > > >> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <aprantl@apple.com>
> wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblaikie@gmail.com>
> wrote:
> > > >>>
> > > >>> A rather common "quality of implementation" issue seems to be
> lambda naming.
> > > >>>
> > > >>> I came across this due to non-canonicalization of lambda names in
> template parameters depending on how a source file is named in Clang, and
> GCC's seem to be very ambiguous:
> > > >>>
> > > >>> $ cat tmp/lambda.h
> > > >>> template<typename T>
> > > >>> void f1(T) { }
> > > >>> static int i = (f1([]{}), 1);
> > > >>> static int j = (f1([]{}), 2);
> > > >>> void f1() {
> > > >>>   f1([]{});
> > > >>>   f1([]{});
> > > >>> }
> > > >>> $ cat tmp/lambda.cpp
> > > >>> #ifdef I_PATH
> > > >>> #include <tmp/lambda.h>
> > > >>> #else
> > > >>> #include "lambda.h"
> > > >>> #endif
> > > >>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH &&
> llvm-dwarfdump-tot lambda.o | grep "f1<"
> > > >>>                 DW_AT_name      ("f1<(lambda at
> ./tmp/lambda.h:3:20)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> ./tmp/lambda.h:4:20)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> ./tmp/lambda.h:6:6)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> ./tmp/lambda.h:7:6)>")
> > > >>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o
> | grep "f1<"
> > > >>>                 DW_AT_name      ("f1<(lambda at
> tmp/lambda.h:3:20)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> tmp/lambda.h:4:20)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> tmp/lambda.h:6:6)>")
> > > >>>                 DW_AT_name      ("f1<(lambda at
> tmp/lambda.h:7:6)>")
> > > >>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o
> | grep "f1<"
> > > >>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
> > > >>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
> > > >>>                 DW_AT_name      ("f1<<lambda()> >")
> > > >>>
> > > >>>                 DW_AT_name      ("f1<<lambda()> >")
> > > >>>
> > > >>> (I came across this in the context of my simplified template names
> work - rebuilding names from the DW_TAG description of the template
> parameters - and while I'm not rebuilding names that have lambda parameters
> (keep encoding the full string instead). The issue is if some other type
> depending on a type with a lambda parameter - but then multiple uses of
> that inner type exist, from different translation units (using type units)
> with different ways of naming the same file - so then the expected name has
> one spelling, but the actual spelling is different due to the "./")
> > > >>>
> > > >>> But all this said - it'd be good to figure out a reliable naming -
> the naming we have here, while usable for humans (pointing to surce files,
> etc) - they don't reliably give unique names for each lambda/template
> instantiation which would make it difficult for a consumer to know if two
> entities are the same (important for types - is some function parameter the
> same type as another type?)
> > > >>>
> > > >>> While it's expected cross-producer (eg: trying to be compatible
> with GCC and Clang debug info) you have to do some fuzzy matching (eg:
> "f1<int*>" or "f1<int *>" at the most basic - there are more complicated
> cases) - this one's not possible with the data available.
> > > >>>
> > > >>> The source file/line/column is insufficient to uniquely identify a
> lambda (multiple lambdas stamped out by a macro would get all the same
> file/line/col) and valid code (albeit unlikely) that writes the same
> definition in multiple places could make the same lambda have different
> names.
> > > >>>
> > > >>> We should probably use something more like the way various ABI
> manglings do to identify these entities.
> > > >>>
> > > >>> But we should probably also do this for other unnamed types that
> have linkage (need to/would benefit from being matched up between two CUs),
> even not lambdas.
> > > >>>
> > > >>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings
> for these symbols is:
> > > >>>
> > > >>>  void f1<$_0>($_0)
> > > >>>  f1<$_1>($_1)
> > > >>>  void f1<f1()::$_2>(f1()::$_2)
> > > >>>  void f1<f1()::$_3>(f1()::$_3)
> > > >>>
> > > >>> Should we use that instead?
> > > >>>
> > > >>>
> > > >>> The only other information that the current human-readable DWARF
> name carries is the file+line and that is fully redundant with
> DW_AT_file/line, so the above scheme seem reasonable to me. Poorly
> symbolicated backtraces would be worse in this scheme, so I'm expecting
> most pushback from users who rely on a tool that just prints the human
> readable name with no source info.
> > > >>
> > > >>
> > > >> Yeah - you can always pull the file/line/col from the DW_AT_decl_*
> anyway, so encoding it in the type name does seem redundant and inefficient
> indeed (beyond/independent of the correctness issues).
> > > >>>
> > > >>> GCC's mangling's different (in these examples that's OK, since
> they're all internal linkage):
> > > >>>
> > > >>>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
> > > >>>  void f1<f1()::'lambda'()>(f1()::'lambda'())
> > > >>>
> > > >>> If I add an example like this:
> > > >>>
> > > >>> inline auto f1() { return []{}; }
> > > >>>
> > > >>> and instantiate the template with the result of f1:
> > > >>>
> > > >>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
> > > >>>
> > > >>> GCC:
> > > >>>
> > > >>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
> > > >>>
> > > >>> So they consistently use the same mangling - we could use the same
> naming for template parameters?
> > > >>>
> > > >>> How should we communicate this sort of identity for unnamed types
> in the DIEs describing the types themselves (not just the string of a
> template name of a type instantiated with the unnamed type) so the unnamed
> type can be matched up between translation units.
> > > >>>
> > > >>> eg, if I have these two translation units:
> > > >>> // header
> > > >>> inline auto f1() { struct { } local; return local; }
> > > >>> // unit 1:
> > > >>> #include "header"
> > > >>> auto f2(decltype(f1())) { }
> > > >>> // unit 2:
> > > >>> #include "header"
> > > >>> decltype(f1()) v1;
> > > >>>
> > > >>> Currently the DWARF produced for this unnamed type is:
> > > >>> 0x0000003f:   DW_TAG_structure_type
> > > >>>                 DW_AT_calling_convention
> (DW_CC_pass_by_value)
> > > >>>                 DW_AT_byte_size (0x01)
> > > >>>                 DW_AT_decl_file
> ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
> > > >>>                 DW_AT_decl_line (1)
> > > >>>
> > > >>>
> > > >>> is this the type of struct {}?
> > > >>
> > > >>
> > > >> Yep. You'll get separate distinct descriptions that are essentially
> the same - imagine if `f1` had two such types written as "struct {}" (say
> they were used to instantiate two different templates - "struct {} a;
> struct {} b; f_templ(a); f_templ(b);" - the DWARF will have two of those
> unnamed DW_TAG_structure_types and two template specializations, etc - but
> no way to know which of those unnamed types line up with uses in another
> translation unit, in terms of overload resolution, etc.
> > > >>>
> > > >>> So there's no way to know if you see that structure type
> definition in two different translation units whether they refer to the
> same type because there may be multiple types that have the same DWARF
> description. (so no way to know if the DWARF consumer should allow the user
> to evaluate an expression `f2(v1)` or not, I think?)
> > > >>>
> > > >>>
> > > >>> Does a C++ compiler usually treat structurally equivalent but
> differently named types as interchangeable?
> > > >>
> > > >>
> > > >> No - given "struct A { int i; }; struct B { int i; }; void f1(A);
> ... " - "f1(A())" is valid, but "f1(B())" is invalid and an error at
> compile-time. https://godbolt.org/z/de7Yce1qW
> > > >>
> > > >>>
> > > >>> Does a C++ compiler usually treat structurally equivalent
> anonymous types as interchangeable?
> > > >>
> > > >>
> > > >> No, same rules apply as named types:
> https://godbolt.org/z/hxWMYbWc8
> > > >>
> > > >>>
> > > >>>
> > > >>> -- adrian
> > > >>>
> > > >>>
> > > >>> I guess the only way to have an unnamed type with linkage is to
> use it inside an inline function - so within that scope you'd have to
> produce DWARF for any types consistently in all definitions of the function
> and then a consumer could match them up by counting (assuming the unnamed
> types were always emitted in the same order in the child DIE list)...
> > > >>>
> > > >>> But this all seems a bit subtle & maybe would benefit from a more
> robust/explicit description?
> > > >>>
> > > >>> Perhaps adding an integer attribute to number anonymous types?
> They'd need to differentiate between lambdas and other anonymous types,
> since they have separate numberings.
> > > >>>
> > > >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dwarfstd.org/pipermail/dwarf-discuss/attachments/20240703/022e3332/attachment-0001.htm>


More information about the Dwarf-discuss mailing list