[Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Sun Jan 23 22:53:51 GMT 2022

A rather common "quality of implementation" issue seems to be lambda naming.

I came across this due to non-canonicalization of lambda names in template
parameters depending on how a source file is named in Clang, and GCC's seem
to be very ambiguous:

$ cat tmp/lambda.h

template<typename T>

void f1(T) { }

static int i = (f1([]{}), 1);

static int j = (f1([]{}), 2);

void f1() {

  f1([]{});

  f1([]{});

}

$ cat tmp/lambda.cpp

#ifdef I_PATH

#include <tmp/lambda.h>

#else

#include "lambda.h"

#endif

$ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
lambda.o | grep "f1<"

                DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")

                DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")

                DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")

                DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")

$ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
"f1<"

                DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:3:20)>")

                DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:4:20)>")

                DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:6:6)>")

                DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:7:6)>")

$ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
"f1<"

                DW_AT_name      ("*f1<*f1()::<lambda()> >")

                DW_AT_name      ("*f1<*f1()::<lambda()> >")

                DW_AT_name      ("*f1<*<lambda()> >")

                DW_AT_name      ("*f1<*<lambda()> >")

(I came across this in the context of my simplified template names work -
rebuilding names from the DW_TAG description of the template parameters -
and while I'm not rebuilding names that have lambda parameters (keep
encoding the full string instead). The issue is if some other type
depending on a type with a lambda parameter - but then multiple uses of
that inner type exist, from different translation units (using type units)
with different ways of naming the same file - so then the expected name has
one spelling, but the actual spelling is different due to the "./")

But all this said - it'd be good to figure out a reliable naming - the
naming we have here, while usable for humans (pointing to surce files, etc)
- they don't reliably give unique names for each lambda/template
instantiation which would make it difficult for a consumer to know if two
entities are the same (important for types - is some function parameter the
same type as another type?)

While it's expected cross-producer (eg: trying to be compatible with GCC
and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or
"f1<int *>" at the most basic - there are more complicated cases) - this
one's not possible with the data available.

The source file/line/column is insufficient to uniquely identify a lambda
(multiple lambdas stamped out by a macro would get all the same
file/line/col) and valid code (albeit unlikely) that writes the same
definition in multiple places could make the same lambda have different
names.

We should probably use something more like the way various ABI manglings do
to identify these entities.

But we should probably also do this for other unnamed types that have
linkage (need to/would benefit from being matched up between two CUs), even
not lambdas.

FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these
symbols is:

 void f1<$_0>($_0)

 f1<$_1>($_1)

 void f1<f1()::$_2>(f1()::$_2)

 void f1<f1()::$_3>(f1()::$_3)

Should we use that instead?

GCC's mangling's different (in these examples that's OK, since they're all
internal linkage):

 void f1<f1()::'lambda0'()>(f1()::'lambda0'())

 void f1<f1()::'lambda'()>(f1()::'lambda'())

If I add an example like this:

inline auto f1() { return []{}; }

and instantiate the template with the result of f1:

 void f1<f2()::'lambda'()>(f2()::'lambda'())

GCC:

 void f1<f2()::'lambda'()>(f2()::'lambda'())

So they consistently use the same mangling - we could use the same naming
for template parameters?

How should we communicate this sort of identity for unnamed types in the
DIEs describing the types themselves (not just the string of a template
name of a type instantiated with the unnamed type) so the unnamed type can
be matched up between translation units.

eg, if I have these two translation units:
// header
inline auto f1() { struct { } local; return local; }
// unit 1:
#include "header"
auto f2(decltype(f1())) { }
// unit 2:
#include "header"
decltype(f1()) v1;

Currently the DWARF produced for this unnamed type is:

0x0000003f:   DW_TAG_structure_type

                DW_AT_calling_convention        (DW_CC_pass_by_value)

                DW_AT_byte_size (0x01)

                DW_AT_decl_file (
"/usr/local/google/home/blaikie/dev/scratch/test.cpp")

                DW_AT_decl_line (1)

So there's no way to know if you see that structure type definition in two
different translation units whether they refer to the same type because
there may be multiple types that have the same DWARF description. (so no
way to know if the DWARF consumer should allow the user to evaluate an
expression `f2(v1)` or not, I think?)

I guess the only way to have an unnamed type with linkage is to use it
inside an inline function - so within that scope you'd have to produce
DWARF for any types consistently in all definitions of the function and
then a consumer could match them up by counting (assuming the unnamed types
were always emitted in the same order in the child DIE list)...

But this all seems a bit subtle & maybe would benefit from a more
robust/explicit description?

Perhaps adding an integer attribute to number anonymous types? They'd need
to differentiate between lambdas and other anonymous types, since they have
separate numberings.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20220123/97784cef/attachment-0001.html>